IMatch 2018 Sneak Peak : Search Speed Improvement!

Started by Mario, March 19, 2018, 08:13:05 PM

Previous topic - Next topic

Mario

Note: There are several other IMatch 2018 sneak peak posts in the Apps Board if you are interested.

On my to-do list for IMatch 2018 was a item labeled "faster search in file window". Although searching was OKish, it was never really blazing fast. And we all like blazing fast  ;)
Features / enhancements like this are on my "always good" list. This means that enhancements in these areas are always welcome and beneficial for the majority of users. Time well spent.

I've started by taking another look at the built-in search engine in the database system used by IMatch. Which is very fast but has two major problems:

A  Updates to the search engine are somewhat slow (and it has to be updated every time you make changes to the IMatch database, which is basically all the time...
This can slow down IMatch considerably. Not good.

B  Only prefix searches. This is really a bummer.

The search engine gets it's speed by using specific algorithms which allow you to search for "tender" and find "tender", "tenderness", "bar tender", ... but not "bartender".
It can only find "words" where your search term starts at the front. It cannot find data where your search term exists somewhere as part of another word or in the middle. This is both irritating and can also lead to false searches.
I've tried various things to overcome this, but all proved to be useless in the end.

So, I was always looking for ways to speed this up. While supporting the full power and flexibility of the IMatch search engine and without slowing IMatch down when you make changes to the database.
And not only for the file window but the IMatch search engine in general. This is also used by the filter panel and IMWS!

I won't bother you with the technical details, but I have come up with a "trick" (ahem, an advanced algorithm) that makes this possible.

Searching the metadata for 50,000 files in 0.3 seconds? How does that sound? ;D

If the super index is not yet build when you run your first search on 50,000 or 100,000 files, you may have to wait one or two seconds longer. But then you get results very fast.
Under normal conditions the super index is ready in the time between you start entering your search term in the file window search bar and you press <Enter>.
IMatch is clever and updates the index if needed in the background when you start typing into the search bar. It takes about 1.5 second to update it for 50,000 files in the scope. Most users need more time to type the search term!

For my largest database with 460,000 files, I can search common metadata fields in less about 4 seconds!.

This super index can only work when you use the "Frequently used tags" search mode. If you need to search the entire database, no shortcuts are possible.

Question:

To make this impressive search speed possible IMatch searches only frequent tags like title, description, keywords, and about 30 others.
Which tags do you search usually? I mean, when you type something like "bla" in the file window search bar, which tags would you expect IMatch to search? Let me know so I can include them.

-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

BanjoTom

The most important metadata fields for me, when searching for files, are: Title, Headline, Description, keywords, city, location, and state.  Of all of those I think "description" is probably the most important of all.   Others' mileage may vary . . .  ;)
— Tom, in Lexington, Kentucky, USA

sinus

For me:

Should be there for sure (my opinion):

Filename (maybe this is already there, I think)

Headline
Keywords
Description
City


Nice to have:
{File.MD.XMP::dc\relation\Relation\0}
{File.MD.XMP::photoshop\TransmissionReference\TransmissionReference\0}
{File.MD.XMP::iptcExt\LocationShownSublocation\LocationShownSublocation\0}

Sounds cool of course, your enhancement!  :D

Best wishes from Switzerland! :-)
Markus

Jingo

for me.. filename, keywords, caption, description, location data and that's about it... but I"m a pretty simple user!

Mario

Thanks to all who replied and let me know which tags they consider essential when searching for frequently used tags.

The trick is for me to search all relevant tags, but no superficial tags.
The less tags searched in that mode, the faster the super index is updated, the faster the search is and the less memory is used.
Currently IMatch searches

xmp::dc
xmp::photoshop
xmp::iptccore
xmp::iptcExt
xmp::pdf

and selected xmp:tiff and EXIF tags, some Office, PDF and video tags.

The first group covers a lot, but a lot of that is really unnecessary for most purposes. I try to trim that down.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Ger

Sounds great.

And I think it's a good decision to limit to frequently used fields only (for me: same as mentioned by others, including filename, headline, description). Of course, the filter panel will allow for the more obscure fields. For fields like City I have a dynamic [Geography] category (Country > Province/State > City > Location) I normally use.

Ger

Mario

The search bar has 3 modes:

a) File name only (special high-speed search for file names only).
b) Everywhere (this covers all metadata in your database)
c) The "frequent tags" mode. This is what we use most and what should be as fast as possible.

I have now selected 59 tags for "frequent tags" mode. Maybe a few less, still pondering.

If the new super-index is up-to-date search times for 60,000 files are 0.3 seconds.
The search index builds in about 3 seconds for a 60,000 files database (on SSD).
For normal scopes (a few thousand files to search) the index update time and the search time are almost zero.

The index needs to be rebuilt when you change the scope or when you change the database.
This means that performing several searches without changing to another folder or modifying the database are 0.3s each. Very fast to 50K files!
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Aubrey

You've covered all bases for my searches: filenames, keywords, possibly description.

No doubt you've thought about "user selection of tags" rather than hard wiring. I suppose that would introduce further complexity in an already feature rich application.

Looking forward to 2018, already saving my pennies!

Aubrey.

DigPeter


Mario

Quote from: Aubrey on March 20, 2018, 12:30:27 PM
No doubt you've thought about "user selection of tags" rather than hard wiring. I suppose that would introduce further complexity in an already feature rich application.
Yes I did. But I dropped this for the reasons you stated. It would increase complexity. Require a lot of work in the UI, new settings dialogs, resources, translations, help topic, ...
I'm quite sure only a few users will ever have a need for this. Better to keep this simple and when there is real demand later, add some sort of configuration feature. In general, I'm more into reducing options and switches. They confuse users  ;)

I think the ~ 50-60 tags I've selected covers everything most users, home and pro, will need to search. For deep searches there is the "Search Everywhere" mode and the Filter panel.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

StanRohrer

I like your search category selection sets - with one exception. Race cars, race horses, race boats, and the like, are often identified by a number. So if I try to search "22" I'll get thousands of files where "22" was found in metadata other than Keywords or Description or Title. I would like you to add another search option for a very narrowly defined set of user input fields, e.g. Title, Description, Keywords.

For my workflow, I would like to come home from a race day with maybe 2300 files in a folder. I could go through fairly quickly and add the car number as a keyword. Then I could search the keywords by car number "22" and get all entries to further add title and description and more specific keyword data (such as driver name, sponsors, finish position, winner).

Note that this is not still a perfect solution as searching for car "2" will also show (as searches are currently defined) car "2", "02", "2A", "22", "23", "62", and many more.

Mario

QuoteKeywords or Description or Title

Hm this would require a 4th mode, in addition to the other 3 modes.
Doable, surely. But then we'll need probably several other modes too...

You did not include headline in your set. Because you probably don't use it. But others might prefer that, or even use both title and headline (different targets). And of course then use both for searching.

What if somebody stores a reference number in the transmission reference (OTR) - this is very common in typical PJ / event workflows. We would also need a separate mode for that.
And if a user want's to search only in EXIF lens to find images taken with the 200mm but not 200 in other tags?

I could split the frequent tags group into smaller groups to cover different "scenarios".
And then allow a user to pic one or more of these groups as the search scope. Plus an "everywhere" group.

But how complicated would that be? How many users would have use for that? Would that be useful for many?
It would also require changes in the search engine, and this propagates to everywhere, from the file window to the filter panel to IMWS to scripting and apps.

I need to think about that. Will do. That's the reason for this discussion, to gather info about your search needs.



I know that Lr does a very basic version of that (They have four groups, when I recall correctly). But Lr has no IMatch Filter Panel and that is much better suited for the task you explain.

When you add the keyword "22" you can easily select these files in the filter panel via

A. The Category Filter, using the @Keywords category "22"

or

B. A Metadata Filter on hierarchicalKeywords. This offers you only the keywords used in the current scope  so you should see the 22 immediately.

or

C. Maybe make a dedicated data-driven category which groups all files (or files from specific categories) by Starter number. Then you can use that in the Filter Panel, or you work "from" that category in the Category View.

Starter/Runner numbers are so specific, I would say you are faster when you use the Filter Panel for that instead of searching.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

sinus

Quote from: Mario on March 20, 2018, 12:15:42 PM
The search bar has 3 modes:

a) File name only (special high-speed search for file names only).
b) Everywhere (this covers all metadata in your database)

I think, it is clear that this is necessary and very good.

Quote from: Mario on March 20, 2018, 12:15:42 PM
c) The "frequent tags" mode. This is what we use most and what should be as fast as possible.

I have now selected 59 tags for "frequent tags" mode. Maybe a few less, still pondering.

I think, here we have the problem, to choose the mosted used tags.
But users use tags differently.

The best solution would be, of course, that the user could for example select a number of tags from all tags.
To limit this accordingly, this number of tags to be selected could be limited, for example to a maximum of 10 tags.
But if that is technically feasible, I do not know.

Another solution would be for me, that you creates some groups, which are fixed with some tags.
These groups could be some "typically" groups, like

The spartanic group
Headline
Description

The keyword-group
Headline
Description
Keywords

The technical group
all Exif-tags like camera, lens ..

Some of such groups (maybe 5-10 groups) would probably suffice for most users.
The others still have filter options and the search for everwhere (version b above).



Best wishes from Switzerland! :-)
Markus

Mario

The search bar in the file window group is intended for quick and easy searches.

For all else, use the Filter Panel.
In the filter panel you can select which tags you want to search, you can combine tags, you have multiple modes.
Trying to duplicate this for the simple file window search bar would be useless and would ruin the simplicity and quickness of this tool.

Quote
The spartanic group
Headline
Description

The keyword-group
Headline
Description
Keywords

No title? No headline?
What if I only want to search keywords? Or file names?
Did you not ask for the transmission reference above? You did not include that in your groups?

I think the idea of the file window search bar should be simplicity. Nothing you need to read 5 help pages and then configure for 10 minutes. Think about new or casual users who are new to IMatch or who use IMatch only every 2 weeks...

3 or 5 options and done. For all else, use the Filter Panel, which gives you all the control and features you request in your post, and more.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

sinus

Quote from: Mario on March 20, 2018, 06:53:02 PM

Quote
The spartanic group
Headline
Description

The keyword-group
Headline
Description
Keywords

No title? No headline?
What if I only want to search keywords? Or file names?
Did you not ask for the transmission reference above? You did not include that in your groups?

Headline yes of course, it is in both groups above already.
File Name is in your mode a already I thought (in your post "a) File name only (special high-speed search for file names only)"

Transmission reference, yes would be fine, to be honest, I thought, this is too special, hence I did not write it again.  ;D
But of course, I would like it, because I have in this tag some client-specific names.

Best wishes from Switzerland! :-)
Markus

Mario

I mean title, sorry. Title is used more often than headline. And the OTR is very important in many workflows so we need it anyway.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mastodon

Sublocation! It contains where it happened, ex. a specific square/stadium/building, home, workspace.

Mario

Quote from: mastodon on March 21, 2018, 09:29:47 PM
Sublocation! It contains where it happened, ex. a specific square/stadium/building, home, workspace.
This is part of the frequent tags already (together Country, Country Code, City, State and Location). And, if I choose to change the search bar to use "sets" it would be part of a "Search Location" set.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook