Saved "queries"

Started by Carlo Didier, January 24, 2018, 07:50:25 AM

Previous topic - Next topic

Carlo Didier

I had an idea, since long, concerning data driven categories and performance problems.
I know this could be difficult to manage in the context of iMatch Anywhere, but I always wanted something like database queries.

There's saved filters, but it might be interesting to have data driven categories and queries which would work the same but only evaluate when requested, like an SQL query on a database.

Same for the category builder where I'd often wish I could easily save such a "query" for future use. I know, you can create a new category from the builder, but that would again be sort of a data driven category.

Maybe a "Queries" panel would be possible where the user could put saved filters, search results, @builder formulas etc and they would only be evaluated on demand when they are needed, i.e. when you click on a query.

Mario thinks
QuoteThe easiest way to do that is to copy the formula shown in the @Builder into a category formula. Create a root category "Queries" and below that any number of formula categories for queries you often need. Set them to "manual update" to not slow down the database and then update them when you need them. Works a treat.

But that's not what I want. The queries panel could contain a mix of data driven categories, saved filters including their scope (!) and built categories. The results would be evaluated at the moment you click on a query.

Mario

QuoteThere's saved filters, but it might be interesting to have data driven categories and queries which would work the same but only evaluate when requested, like an SQL query on a database.

Tip: You can set both data-driven categories and formula-based categories to manual update. If this option is enabled, the categories only update when you explicitly update them via the context menu or keyboard shortcuts.

Carlo Didier

Quote from: Mario on January 24, 2018, 08:40:23 AMTip: You can set both data-driven categories and formula-based categories to manual update. If this option is enabled, the categories only update when you explicitly update them via the context menu or keyboard shortcuts.

I know, but you can't just left-click once to get the results (it's a multiple clicks/keys operation: left-click to select, then right-click and select in menu or a key-combination) and you can't put differen types of queries together in a specific panel.

JohnZeman

I'm not sure I understand what you're doing here but if your data driven categories are set for manual updating you can manually update any DD category by first selecting it then pressing SHIFT+F5

sinus

To be honest, I do not understand, what you are talking about, Carlo, sorry.

I think, this is because my lack of English and second because my lack of technical understanding.
Best wishes from Switzerland! :-)
Markus

Mario

#5
Quote from: JohnZeman on January 24, 2018, 11:14:04 PM
I'm not sure I understand what you're doing here but if your data driven categories are set for manual updating you can manually update any DD category by first selecting it then pressing SHIFT+F5

I think Carlo would like to have a "Update on mouse-click" feature. To avoid pressing <Shift>+<F5> when he wants a manual category to update.
In that case, what about the Category Panel? Same there?

In this case I would also have to add a prompt dialog, in case other users don't want this automatic update to happen. Probably we need another global option which controls this behavior.
It's never that simple.

@Carlo: Have you checked how long your manual categories need to update? IMatch logs that info to the log file. You can also search for #slow to find categories which require very long to update.
Since IMatch updates categories only when needed, and caches the result in memory, maybe you don't need to set your categories to manual at all?
This feature has been added for users with very large databases, many dd-categories which are not needed often...

Carlo Didier

Quote from: Mario on January 25, 2018, 08:54:10 AM@Carlo: Have you checked how long your manual categories need to update? IMatch logs that info to the log file. You can also search for #slow to find categories which require very long to update.
Since IMatch updates categories only when needed, and caches the result in memory, maybe you don't need to set your categories to manual at all?
This feature has been added for users with very large databases, many dd-categories which are not needed often...
No, I never clocked it before now. I update them occasionally manually when there have been many changes. And because of performance issues with dd-cats I have written scripts to assign categories instead of the data driven calculations. If only those scripts could be triggered by events as with IM5 ... (because then all would be done in the background while importing without noticable performance impact).

I'm just used to working quite a lot with databases where I open a query which is then evaluated at that moment.
It would just be so much simpler to click on a "query" and see the results being evaluated and displayed without further ado. Right now, I have to go through several clicks or clicks and key combinations to do that. I was just thinking to simplify the procedure to get to a result (isn't iMatch 2017 all about simplification?).

Just for reference, my db holds about 92000 images and recalculating the few remaining data-driven cats (GEAR, i.e. Cameras and lenses, File Types and IPTC Location) takes around 1'30". The db is on an SSD.

sinus

Carlo, did you try to set your DD-categories to automatic update?
With 92'000 files this should not be a big deal.

If a manual update takes 1.30, then I would try this. At least for some days to see, how or if this does affect your work.
Best wishes from Switzerland! :-)
Markus

Mario

Quote from: Carlo Didier on January 29, 2018, 08:29:15 PM
Just for reference, my db holds about 92000 images and recalculating the few remaining data-driven cats (GEAR, i.e. Cameras and lenses, File Types and IPTC Location) takes around 1'30". The db is on an SSD.

I'm not sure that I understand how you work with IMatch or your "queries"... but the core concept of data-driven and formula-based categories is that they are kept up-to-date automatically and that IMatch performs these updates in the background. Setting a category to manual (and thus somehow making it a query that needs to be run explicitly by the user) should be an exception.

A database with 92K files is small.
Please show me a log file from a session where you recalculated your DD cats. 1 minute 30 is very slow.
Are you sure your virus checker is not interfering. Such a small DB on a SSD should perform a lot better.

I've just made a few tests. Database with an ~80,000 files (RAW/JPEG,mixed other files) on Samsung Pro SSD.

Recalculating some of the default IMatch categories (ISO, Location, Lens). Average time to recalculate these data-driven cats is 0.5 seconds each!
Refreshing all data-driven categories maybe 5 seconds, including all the manual UI logic with the dialog and process bar etc.
Even if your database has "more" data for the image files, it should not take that long. Or do you use super-duper complex data-driven categories with five levels, variables on every level?

On my largest database with 420,000 files, refreshing all data-driven categories (standard IMatch categories but some I've added) takes about 11 seconds on a "cold" database.
A variable-based data-driven category I've created for testing purposes takes less than 7 seconds to update, for 420,000 files!

Your 1 minute 30 seconds for a 90K files database seems really slow...

sinus

Quote from: Mario on January 30, 2018, 08:55:43 AM
Quote from: Carlo Didier on January 29, 2018, 08:29:15 PM
Just for reference, my db holds about 92000 images and recalculating the few remaining data-driven cats (GEAR, i.e. Cameras and lenses, File Types and IPTC Location) takes around 1'30". The db is on an SSD.

Your 1 minute 30 seconds for a 90K files database seems really slow...

For MY Database with 230'000 files this is about the same level, roughly.  The DB is on a SSD, but the files on a normal HD.
So yes, the calculation for the DB from Carlo seems a bit slow.

The slowest DD in my case is the calculation of the formats, this takes 33 seconds.
Some other DD takes between 5 and 20 seconds.

But since the calculation is in the background, I think, this is not a big deal.
Best wishes from Switzerland! :-)
Markus

Mario

@sinus

What kind of category is "formats"? This is not a standard IMatch category. How it is defined?

Data-driven categories depend on the I/O performance of your system (disk, memory, controller). To re-calculate a category, e.g. for ISO, IMatch needs to read the ISO metadata of all files, group the files by their ISO value and then create the categories.

I tried that again.
For my 80,000 files database I selected a tag that has a value for each file: photools.com::IMatch\101200\file.ext. This makes sure that all files in the database are processed.
This results in 38 child categories, one for each extension. The total runtime is 0.2s for the database operations and 0.27s for updating the categories. About 0.5 seconds in total. My computer is almost 3 years old now.

It is mandatory to exclude the folder containing the database from on-access virus checks. Make sure you did.

sinus

Quote from: Mario on January 30, 2018, 01:15:10 PM
@sinus

What kind of category is "formats"? This is not a standard IMatch category. How it is defined?


Sorry, Mario, in the attachement you can see this.
Maybe (and this is very good possible) it is not clever created from me.
Best wishes from Switzerland! :-)
Markus

Mario

You have many replace masks.

Apparently you are trying to convert all extensions to lower case. This may cost some time.
It would probably be faster to use the Unify Spelling property and set it to All lower case:




For my 80,000 files database, it takes 156ms (0.2s) to calculate. With the "all lower case" option.

Tip: Search the log file for RefreshGroup-Completed to find the time.

sinus

Quote from: Mario on January 30, 2018, 04:14:14 PM
You have many replace masks.

Apparently you are trying to convert all extensions to lower case. This may cost some time.
It would probably be faster to use the Unify Spelling property and set it to All lower case:




For my 80,000 files database, it takes 156ms (0.2s) to calculate. With the "all lower case" option.

Tip: Search the log file for RefreshGroup-Completed to find the time.

Mario, cool!

Thanks a lot for your tip!


With my old replacement-stuff: 34 sec for 260'000 files
With your new "all lower case": 1,3 sec!

Unbelievable!

Best wishes from Switzerland! :-)
Markus

Mario

QuoteWith my old replacement-stuff: 34 sec for 260'000 files
With your new "all lower case": 1,3 sec!

See?  :)

This was to be expected. Running so many regular expression searches for every file in the database does cost a lot of time.