Creating a category alias takes ~30 seconds - is that normal?

Started by Carlo Didier, February 08, 2015, 05:38:10 PM

Previous topic - Next topic

Carlo Didier

When I create a category alias using Ctrl-C and then Shift-Ctrl-L, it takes ~30 seconds to create and display the alias in the category list (even if thumbnail display is paused).
System is a Core i5-2300 @ 2.80GHz with 16GB RAM and the system, applications and the database on an SSD.
Debug log attached.

[attachment deleted by admin]

Mario

This is usually so fast it's not even logged.
I see no extended waits, blocks, errors or warnings (except a non-configured spell checker) in your log file.
The log file shows three groups being created, and none of the operations take more than 10ms.

What were you doing before?
You created an Alias for which kind of category (@Keywords perhaps)?

I only thing I see is that your category tree uses a log of regular expressions, and re-calculating them can be a very, very expensive and time-consuming operation. Always make sure that when you really need to use this you restrict the regexp to the smallest section of your category tree.

Carlo Didier

Quote from: Mario on February 08, 2015, 07:03:20 PM
This is usually so fast it's not even logged.
I see no extended waits, blocks, errors or warnings (except a non-configured spell checker) in your log file.
The log file shows three groups being created, and none of the operations take more than 10ms.
So it's not the creation of the alias itself.

Quote from: Mario on February 08, 2015, 07:03:20 PMWhat were you doing before?
You created an Alias for which kind of category (@Keywords perhaps)?
For the log, I just started iMatch and created the three aliases of simple manually assigned categories.

Quote from: Mario on February 08, 2015, 07:03:20 PMI only thing I see is that your category tree uses a log of regular expressions, and re-calculating them can be a very, very expensive and time-consuming operation. Always make sure that when you really need to use this you restrict the regexp to the smallest section of your category tree.
I thought I had removed all calculated cats based on regexes. I'll check that.

Thanks for the hints.

Carlo Didier

Ok, there is one data driven category left (see attached definition). But removing the regex, apart from giving me lots of unwanted results, didn't solve the delay problem.
Is there any simple way to identify how long each data driven category takes to update?
It would be good to be able to enable/disable a category, without having to delete and re-create it each time to test.

[attachment deleted by admin]

Carlo Didier

I think I found which categories might cause this. I have a number of cats which use RegEx to select all images from given holidays or events.
This is done by using a regular expression on the file names (which contain the creation dates) like this:
"@FileRegExp[^[a-zA-Z]20130(7(19|2[0-9]|3[0-1])|8(0[1-9]|1[0-5]))]" selects dates from 19.07.2013 to 10.08.2013
Would there be a better way to select files between two dates?

JohnZeman

Quote from: Carlo Didier on February 08, 2015, 10:43:26 PM
Would there be a better way to select files between two dates?

Have you tried a basic "From - To" date filter using the filter module?

Carlo Didier

Quote from: JohnZeman on February 08, 2015, 11:30:23 PM
Quote from: Carlo Didier on February 08, 2015, 10:43:26 PM
Would there be a better way to select files between two dates?

Have you tried a basic "From - To" date filter using the filter module?
Yes, but I don't want a filter, I want a category, because I try to manage everything with categories.

Ferdinand

Quote from: Carlo Didier on February 08, 2015, 11:37:19 PM
Yes, but I don't want a filter, I want a category, because I try to manage everything with categories.

I can relate to this.  But after an initial burst of enthusiasm about regex categories a while back, there was a flurry of posts about the problems they caused.  So I exported them (without file assignments) and then removed them from the DB.  If I need them, then I can either use a filter, as John suggested, or temporarily reimport them.  It's not as convenient, but seems to cause less problems.

Mario

Regular expressions are not per-sé a problem. Users just need to understand what they are asking IMatch to do.

Using a regular expression which finds all files starting with "beach" requires IMatch to apply the regular expression to all file names in the database. In the case of Carlo this are about 90,000 files. Running a regular expression like

@FileRegExp[^[a-zA-Z]20130(7(19|2[0-9]|3[0-1])|8(0[1-9]|1[0-5]))]

90,000 times is quite a task. Once the results have been calculated, they are cached so future access to the category is fast.

But IMatch has to throw away all cached categories frequently, e.g. when metadata is changed (may affect many categories), when categories are added/removed/renamed/movied/copied etc. This causes a delayed (!) recalculation of all formula-based categories (and data-driven categories if MD was changed).

Delayed means that IMatch calculates categories on-demand - when needed. But if the formula-based category is visible, or you use category counts, or you work in the category view, or you have the category panel open (or a combination of several of these) there is a good chance that creating an Alias forces a re-calculation of many categories at once. And this can take some time.

Category calculations are performed so often and are usually so fast that there is no logging.
But looking at what users are doing with the advanced formulas like regexp or @Category (without starting at @All) I will add some code which times the categories and logs them if they take more than a few seconds. This will at least give us a way to tell the user which categories are the problem.

Mario

Perhaps better create a data-driven category on dates, and then use a @Category expression to grab the part you are interested in. Not tried, but should be a lot faster.

Ferdinand

I haven't forgotten this post of yours, in which you expressed strong views.  I bet the OP hasn't either.  It's why I exported these cats and removed them:
https://www.photools.com/community/index.php?topic=3469.msg23059#msg23059

Mario

I've made a few quick tests with a 150,000+ files database.
The regular expression posted above is processed in less than 2 seconds (DB on a SSD, but that should not matter that much because file names are cached in memory anyway).

I created several other regexp, e.g. to catch all files starting with a-d (9000 files) but that also took less than 2 seconds because it was not logged.

Carlo Didier

Thanks everyone for your input!
I have already around 20 such categories using @FileRegExp and wanted to create a lot more. Looks like I have to look for another solution.
Filters and temporarily imported categories are out because it's way too complicated and makes the whole categorizing useless.
I'm often asked to get all the images with certain people from a certain event and with categories I can very quickly find all those images with the category builder by choosing the corresponding event and people categories.
Of course, I could manually select and assign event categories but then I might forget some images and I need to be sure that those categories are propagated to versions and if new files show up which are not versions, I have to remember to assign them manually too. Simply defining a date/time range for events is so easy and straight forward, that's why I choose that path.

As to the recalculation of the categories, I don't see any possible reason to recalculate anything if I create an alias for an existing category. I just create an alias. No files are re-assigned, added or removed, so no data driven or otherwise calculated category or collection needs to be refreshed. So why the re-calculation?

Mario

IMatch always clears the category cache when the category structure changes (you add a category). This is way more reliably than trying to figure out if and how many categories _may_ be affected by the most recent change done to the structure.

If you have many categories using FileRegExp, this adds up of course. The big red warning about FileRegExp in the help has a reason, after all.

As I said, you can do the same, but much faster, by setting up a data-driven category which produces year/month/day (or week) and then alias or reference those categories as need to pick out your events.

Also, since the events don't change, can't you just select one of your FileRegExp expressions and then assign all the files in that category an event category? You can then remove the FileRegExp. Basically a "do it once" approach.

Carlo Didier

Quote from: Mario on February 09, 2015, 02:00:44 PMIMatch always clears the category cache when the category structure changes (you add a category). This is way more reliably than trying to figure out if and how many categories _may_ be affected by the most recent change done to the structure.
I understand. But then, creating an alias is definitely not requiring any re-evaluations. Pretty clear case there. This might just be a "nice to have" feature request.

Quote from: Mario on February 09, 2015, 02:00:44 PMAs I said, you can do the same, but much faster, by setting up a data-driven category which produces year/month/day (or week) and then alias or reference those categories as need to pick out your events.
Good idea. Assembling a dozen categories (for a dozen days for example) shouldn't be more complicated than creating the corresponding regex. But that could be unreliable for any files added later, like a collage or banner created in Photoshop, which would belong to the event but with a creation date outside of the referenced categories, while I would still use the correct date in the file name, which would get it picked up correctly by the regex ... Same thing with scanned files.

Quote from: Mario on February 09, 2015, 02:00:44 PMAlso, since the events don't change, can't you just select one of your FileRegExp expressions and then assign all the files in that category an event category? You can then remove the FileRegExp. Basically a "do it once" approach.
See above.

Mario

Well, not even IMatch can do everything. Calculating a FileRegExp once for 90,000 files is no problem. But if you have dozens of more of categories with FileRegExp you'll have to wait. I made a test with 20 FileRegExp categories on my 140,000 files test database and I did not notice any slow-down when creating new categories, Alias or otherwise. How many categories of this kind do you have?