Slowness when updating keywords: #SLOWCATS in log.. ignore? delete? what to do?

Started by Jingo, May 18, 2020, 03:16:32 PM

Previous topic - Next topic

Jingo

So... I find myself often waiting and waiting for IMatch after doing something simple - like deleting a keyword from 16 images.  I get the spinning cursor and when I view the log file, I often see ##SLOWCAT for some of the standard categories.  I assumed turning off the parameter for background updating of these would prevent them from auto-updating ... but that apparently is only for the "true" data-driven variety.

So - is there any way to prevent these categories ("All Time Stamps", "No Keywords") from updating besides deleting them?  I've attached my log file for review in case there are other issues that I cannot see... at the end of the file, you can see it looks like it took 22 seconds to update a single keyword removal from 16 files... the software was "locked" to me during the time it was updating.

Thx - Andy.

Mario

Your database is very small, less than 100,000 files.
There should be no wait times at all. This system is not performing all too well.

How to you change keywords? In which panel?

Virtually all of the standard categories IMatch uses are reported as excessively slow to update.
20 seconds (!) to reload a file window, just because IMatch had to recalculate all formula-based categories.

Do you work in the category view? Many categories expanded and visible?
Do you have the category panel open? Many categories expanded and visible?
Note that IMatch only updates categories which are visible, so you may want to collapse the "IMatch Standard Categories" and "IMatch Workflow Categories" for a start.

Of course every time you change metadata (including keywords) all (or most) categories become invalidated. And when they are displayed, or used somehow (File Window layout, panels, filters) the have to be calculated immediately, before IMatch can continue.

But IMatch does that in the background, and in parallel, and there should be virtually no waiting time.
Unless you have produced a workspace of file Window layout (variables?) which have to wait for the categories to reload. And once that is done, you change a keyword, and all categories need to be calculated again.

Without more detailed info about your workspace and file window layout etc.- impossible to tell.
Is this a fast computer? Database on SSD? Virus checker configured to not scan the database folder or IMatch all the time?
All database operations seem quite slow for such a small database. 8 seconds for a simple search operation in the database (to fill one category)...

Or a really slow 16 (!) seconds to refresh @Keywords after changing keywords for 16 files. This is way, way too slow.

I've made some quick tests with a 650,000 files database on my development system (fast PC!):
Refresh @Keywords completely: 4 seconds.
Partial refresh of @Keywords after adding a new keywords to 100 files: < 0.5 seconds.
@SLOWCAT reported, but less than 2 seconds each (for 650,000 files).

Jingo

Yes.. I know... very slow and frustrating.. here are some answers:

I change keywords using the keywords panel - for the example in the log, I had 16 files which is just right clicked the keyword and chose delete.. then the green check.

I did have the category panel open but not active (ie: not in view) but I suppose that might have been the culprit... because once I closed that panel - all the SLOWCATs stopped showing in the log and things seem a bit zippier...  though - adding the panel back in and performing the same tests don't show SLOWCATs either.. sigh.

Database on SSD, no virus scanner... aging PC but still plenty of headroom for Imatch: i5, 16GB, SSD, 7200 RPM data drives

Did a compact and optimize and will keep an eye on things to see if they improve...  latest refresh of @Keywords:

05.18 13:21:56+    0 [363C] 10  I>                     RefreshGroup-Begin: '@Keywords'.
05.18 13:21:56+  328 [37D8] 02  I>                        Group-Query: 328ms
05.18 13:21:56+   63 [37D8] 02  I>                        132442 values retrieved from database. 0 children before in group
05.18 13:21:56+   31 [37D8] 02  I>                        10 children after in group
05.18 13:21:56+   31 [363C] 10  I>                     RefreshGroup-Completed: '@Keywords' in 453 ms.


Thx Mario... will see if I can figure it out!




Mario

0.5 seconds is a lot faster than 16 seconds. Which was exceptionally slow.

QuoteI did have the category panel open but not active (ie: not in view)

If the panel is not active/visible (do you use it docked, in a group, standalone window, other monitor, ...)? it does not update the counts, and hence does not cause a recalculation of categories. Also, when the parent category is closed, no child categories are calculated (unless they are needed otherwise, e.g. for your file window layout).

If you can produce a situation where IMatch again slows down so much on your system (where it runs 5 to 10 times slower with 90K files than my PC with 650K) let me know. May be a side effect or something. IMatch is doing awful lots in the background. And the simple task of "changing keywords" actually causes a lot of activity to keep everything up-to-date.

Tip: Delete 'expensive' categories you don't need.

Using no virus checker would be risky. Windows Defender usually works well and does not interfere (most of the time).

Jingo

Quote from: Mario on May 18, 2020, 08:55:01 PM
0.5 seconds is a lot faster than 16 seconds. Which was exceptionally slow.

QuoteI did have the category panel open but not active (ie: not in view)

If the panel is not active/visible (do you use it docked, in a group, standalone window, other monitor, ...)? it does not update the counts, and hence does not cause a recalculation of categories. Also, when the parent category is closed, no child categories are calculated (unless they are needed otherwise, e.g. for your file window layout).

If you can produce a situation where IMatch again slows down so much on your system (where it runs 5 to 10 times slower with 90K files than my PC with 650K) let me know. May be a side effect or something. IMatch is doing awful lots in the background. And the simple task of "changing keywords" actually causes a lot of activity to keep everything up-to-date.

Tip: Delete 'expensive' categories you don't need.

Using no virus checker would be risky. Windows Defender usually works well and does not interfere (most of the time).

Thx Mario - I will play around and see what I can see.

I actually do have Defender running along with defender firewall (via Malwarebytes addon)... I should have said "3rd party" antivirus... :-)

Jingo

Ok.. so, I do have a few observations that I think it related to my system slowness...

1 - I noticed that if I have a large category selected in the categories tab (not panel which is closed) - even if I am not on that tab (ie: I'm looking at a subset of images in a result window), then the system updates those categories.  While I can appreciate the need to so when that category is active - why would just having that category selected cause them to be recalculated?  If I switch to a category that has 0 images.. then, the system is much faster.

2 - I noticed when I have the Quick View window open, the systems takes between 1 and 2 seconds to load the preview cache... I am using a 100% JPG database so have the cache turned off (per recommendation in the preferences).  I can understand the system taking a bit to load an image and appreciate it trying to buffer future images to make the Quick View window faster... but 1-2 seconds per images cache seems very slow and is certainly causing performance issues on my machine.  When I close the Quick View - things speed up quite a bit.

3 - I noticed that when I apply a metadata template, there is a noticeable delay before I can then add keywords from the keywords panel to an image.  If I click in the keyword box and start to type, the keyword often appears but then disappears as the metadata template keywords are added...

Log attached... appreciate thoughts on these... Thx - Andy.

Mario

Category calculations can be caused by many things. From the category being selected, to counts being visible, to formulas requesting the category to the file window requesting the category or the category bar to...

Your log file does not contain any slow operations logged.
The average load time for cache images is about 0.6 seconds.

When you apply a metadata template, data in the database is modified. And this may require the Metadata Panel / Keyword Panel to update.
But IMatch usually locks the panel while a database update is running, exactly for this purpose.

Jingo

Ok.. so something is really really odd here... I have no preview panel open, no category panel open.  My collections Tab is set to an item with only 30 items.. my categories tab is set to 'rejected files' which is set to 0.

I deleted a keyword from 2 images using the keyword panel... I then added a keyword to 2 images.  Hitting the green checkmark, my system froze for about 12 seconds... the log shows it took 9.5 seconds to complete the "RefreshGroup-Completed: '@Keywords' for 2 files"... my question is why?

My database is on a SSD... I have plenty of free memory (8GB of 16GB free).... my CPU is doing basically nothing... I'm really confused by the performance I have been getting with a database of only 84,000 images.

I'm still seeing 2 #SLOWCATS# even though I don't have any category panels open nor is the category tab expanded or on a category with values (not sure if that even matters)....

Any thoughts Mario...?  Log attached.

sinus

Jingo

Is this behaviour also with the new version? (.6)
I did not try it, I know only, that the collection view with the old version is something curious and very slow...

I have not yet the time to install the newest verson (though downloaded)
Best wishes from Switzerland! :-)
Markus

Mario

@sinus

If you experience things like "that the collection view with the old version is something curious and very slow...",
please open a bug report, include a log file in debug mode + machine spec. Collections can be demanding to calculate - there is no free lunch and even IMatch has its limits.
Performance depends on many things, from the database size (files) to the disk to the number of processor cores, other apps running at the same time etc.

IMatch 2020 is the fasted IMatch version ever. I have completely revamped how it works with the database, removing dependencies and improving the concurrency behavior.
Which means IMatch can do more in parallel without saturating the database. Which is very good.

But I recall you do all kinds of fancy stuff in your file windows, with variables and whatnot. And this can severely impact overall system performance, because it forces IMatch to stop and update categories and collections and formulas very often.

Do you use the default settings for performance-related features?
Virus checker configured to exclude the database folder (90% of the "IMatch is slow" reports come from that).

So, a debug log file will show us more about how your system performs. And we see if you manage 50K files or 500K. And this may help us to give you some tips to improve performance.

@Jingo:

Your database is not performing too well.It has less than 90K files but database queries run slower as I usually see with databases having 200K or more files.
For such a small database on a SSD and with an i5, query performance should really be much better.

Have you expanded the "IMatch Standard Categories" and "IMatch Workflow Categories" nodes in the Filter Panel / Category View / Category Panel? This would cause a refresh of the categories often...

Categories are calculated on-demand and the results are cached until invalidated. Invalidation occurs for example when changes are done to Metadata. Which means keywords. But this would only invalidate all these data-driven and formula-based categories, not calculate them immediately. They will be re-calculated when needed. Which may be never, if they are not visible somewhere and not referenced anyway.
Do you use them somewhere? In a file window? Formula? Variables?

Just made some quick tests with a mid-size 240,000 files database.
On my new PC (m2 SSD for database, 12 core CPU) I see no #sl warnings at all. Of course this system is a lot faster than yours, but the typical query times required by the expensive IMatch Workflow categories are all way beyond one second (instead of 2+ ##slowcat is logged when a category takes more than 2 seconds to calculate). Despite the database being 3 times as large as yours.

sinus

Quote from: Mario on May 27, 2020, 02:05:57 PM
@sinus

If you experience things like "that the collection view with the old version is something curious and very slow...",
please open a bug report, include a log file in debug mode + machine spec. Collections can be demanding to calculate - there is no free lunch and even IMatch has its limits.
Performance depends on many things, from the database size (files) to the disk to the number of processor cores, other apps running at the same time etc.

Sorry, Mario
but I did not explain it good enough.
I spoke from the bug, what you have solved in the newest version (calculations are sometimes set to 0).
MAYBE this could be also related to Jingo's problem?!

It was of course a shot in the dark.

And sorry again, I have written a bit "not enough clear".
Best wishes from Switzerland! :-)
Markus

Mario

Quote from: sinus on May 27, 2020, 02:12:44 PM
but I did not explain it good enough.
I spoke from the bug, what you have solved in the newest version (calculations are sometimes set to 0).
MAYBE this could be also related to Jingo's problem?!

No. This was not a performance issue.
It was caused by a performance optimization that allowed IMatch better concurrency by calculating collections without locking certain structures by using temporary collection objects. But this also caused some collections not to update properly at all times, due to caching issues. This has been resolved, performance is unchanged, as good as before.

Jingo's problem is that formula-based categories which rely on metadata are quite slow. These are used (by default only) by the IMatch Workflow Categories. But unless these categories are all expanded or 'used' somewhere, they are almost never re-calculated.

claudermilk

Quote from: Mario on May 27, 2020, 02:05:57 PM
IMatch 2020 is the fasted IMatch version ever. I have completely revamped how it works with the database, removing dependencies and improving the concurrency behavior.
Which means IMatch can do more in parallel without saturating the database. Which is very good.

But I recall you do all kinds of fancy stuff in your file windows, with variables and whatnot. And this can severely impact overall system performance, because it forces IMatch to stop and update categories and collections and formulas very often.
Jumping to provide an ancedote on this.

I (well, my department, thus company) ran into a similar issue with out ERP system. It's highly customizable, and allows custom calculations all over the place. One of those places is with record updates. This can trigger our custom code with every record change. While it's really cool, we learned that it can also kill system performance. Even with a rack full of ridiculously fast servers and storage. We had to rethink our approach and back off all the fancy custom update code.

My point being, Mario has a good point and maybe reviewing custom calculations could help improve performance.

sinus

Thanks for your anectode.  :D

Well, I use really a lot of variables, and complicted variables.

But I did a lot of tests, it did never really slow down something.
IMatch is incredible fast for viewing thumbs with a lot of info there (with variables, icons and so on).

I can remember one time, when I did a regex-formula for categories.
This slowed down my whole computer really badly, after I saw, what it is, I deleted all these regex-stuff and IMatch was as quick as before.  :D

Best wishes from Switzerland! :-)
Markus

Jingo

Quote from: Mario on May 27, 2020, 02:05:57 PM

@Jingo:

Your database is not performing too well.It has less than 90K files but database queries run slower as I usually see with databases having 200K or more files.
For such a small database on a SSD and with an i5, query performance should really be much better.

Have you expanded the "IMatch Standard Categories" and "IMatch Workflow Categories" nodes in the Filter Panel / Category View / Category Panel? This would cause a refresh of the categories often...

Categories are calculated on-demand and the results are cached until invalidated. Invalidation occurs for example when changes are done to Metadata. Which means keywords. But this would only invalidate all these data-driven and formula-based categories, not calculate them immediately. They will be re-calculated when needed. Which may be never, if they are not visible somewhere and not referenced anyway.
Do you use them somewhere? In a file window? Formula? Variables?

Just made some quick tests with a mid-size 240,000 files database.
On my new PC (m2 SSD for database, 12 core CPU) I see no #sl warnings at all. Of course this system is a lot faster than yours, but the typical query times required by the expensive IMatch Workflow categories are all way beyond one second (instead of 2+ ##slowcat is logged when a category takes more than 2 seconds to calculate). Despite the database being 3 times as large as yours.

Ok... well, collapsing the tabs might have helped... I had categories and collections expanded on the category tab and collections tab.... I didn't realize that these tabs would come into play even without focus...  Once I minimized both tabs completely, it seems the long delays I've been experiencing with the re-calcs are better and I don't see the recalcs in the log. 

It does seem there is something strange with my database or system though that having these expanded would take so long ... having them closed all the time is not a problem as I do not use categories.. but it would be nice not to have that happen.  Honestly, I've often wanted to close these tabs and only load them when needed (or have them behave as panels).

Would information from the thumbnail display require a collection or category refresh as well?  As you mention, I do use a lot of display items to show the number of keywords or if an image has metadata.  These are just using file variables... but figured I would throw it out there.

I haven't updated to the very latest version yet... just saw that this morning so I am still using .5.4

Mario

QuoteBut I did a lot of tests, it did never really slow down something.

But your variables may force IMatch to re-calculate categories and/or collections more often than usual.
And this may cause performance issues.

It's like users creating 20 data-driven categories using complex variables. And ignoring the "set to manual" warnings in the IMatch help.
And then they contact me and complain about performance. And it takes 10 emails unless I figure out what they did.
This was the day I've added the ##SLOWCAT warning to the log file. As a hint for me to quickly see if categories take > 2s to update. Which might be a reason for bad performance. But not necessarily.

IMatch calculates categories in the background, and in parallel. And only when needed.
But if you produce a file window layout or panel layout which requires IMatch to constantly recalculate all collections and all categories after you've made a metadata change, performance will suffer. IMatch does some smart things to figure out which and when categories need to be updated. But there are limits, and IMatch rather updates instead of showing stale or wrong data.

Mario

Quote from: Jingo on May 27, 2020, 06:27:08 PM
Ok... well, collapsing the tabs might have helped... I had categories and collections expanded on the category tab and collections tab.... I didn't realize that these tabs would come into play even without focus...  Once I minimized both tabs completely, it seems the long delays I've been experiencing with the re-calcs are better and I don't see the recalcs in the log. 

IMatch does not update invisible Views (Category View or Collections), e.g. while the Media & Folders View is active.
But it has to update the category panel and category filer/collections filter etc. when they are visible.

If you can give some details about your workspace, I can look into this.
What exactly did you collapse, and where?
Which panels are visible / loaded into groups?
There are situations where updates are needed, even if a View is currently not visible. But these are very rare.

Update: I've made some tests here.
And if the Workflow Categories are not visible (parent collapsed) or neither the Category Panel nor the View is active / visible, the categories are not recalculated.
If they are recalculated often on your side, there must be something else that queries them...variables somewhere?

Jingo

Strange... here is my workspace:

Main Monitor -
  Media&Folders tab (almost always open this tab - all other non-visible tabs are typically not collapsed but I did collapse them for this experiment)

  Main thumbnail window (12 thumbs - with my custom thumbnail display details)

  Keywords Panel
  Filter Panel (all items closed - not enabled)
  App Panel (usually one of my apps)
  App Manager Panel
  Favorites Panel

2nd Monitor -
  Map Panel
  Metadata Panel
  Quick View Panel


So, no category Views are displayed nor collections (since the tabs are not focused).  The filter panel is open but not active.  I do not use categories so the categories panel is never active as well.

Would displaying info on thumbnails cause a recalculation?  My thumbnail display info is quite detailed... but it is just checking values so that should not cause a recalc right?

I just ran into the issue again and see the #SLOWCAT info... log attached.  I verified no category panels are open and the tabs are collapsed.  I can always tell when it happens because my display "distorts" and freezes... then pops back to normal.  All I did was add a keyword "snow" to an image and then move to the next image. 

From viewing the log - it seems to then take 9-10 seconds for the system to update the "No Title" and "No Copyright" categories... why are they being updated... and why is it so slow to do so?

Thanks for the assistance! - Andy.

Mario

Try closing all panels and repeat what ever causes this.
If the problem no longer happens, we have an additional clue.

I have nothing to add to all the analysis and details I have given above.

Jingo

I'll try... the hardest part is getting it to reproduce... doesn't always happen even though I haven't changed a thing (nor restarted the program).

Mario

IMatch needs about 6 seconds to determine that 47,658 of the files in your database have no title.
And 3 seconds to figure out that 43,233 files have no copyright.

There are the only two slower operations. IMatch calculates the collections once, so this is normal.
I see no repetitive collection updates in the log file. Looks OK.

Jingo

Quote from: Mario on May 28, 2020, 08:41:45 PM
IMatch needs about 6 seconds to determine that 47,658 of the files in your database have no title.
And 3 seconds to figure out that 43,233 files have no copyright.

There are the only two slower operations. IMatch calculates the collections once, so this is normal.
I see no repetitive collection updates in the log file. Looks OK.

I'm just not sure why it is calculating those two categories when I don't think I have anything open that would cause it to happen..

I'll try with the panels closed and keep an eye on things... thx Mario!

Jingo

Uggh... the severe slowness is back again.. a bit different this time - but I'm not sure what is happening.  I was just trying to browse to some bookmarks via a collection when IM froze for a long time... I took a backup of the log but cannot attach it as it is 19 MB.  Tons and tons of repeating:

07.13 12:36:41+    0 [370C] 10  M>                     > 20 CIMQueryCache::Update  'V:\develop\IMatch5\src\IMEngine\IMQueryCache.cpp(179)'
07.13 12:36:41+   16 [370C] 10  I>                       Query Cache Memory used: 4 KB for 350 entries and 7 tags. Scope size: 123
07.13 12:36:41+    0 [370C] 10  M>                     < 20 [16ms] CIMQueryCache::Update
07.13 12:36:41+    0 [370C] 50  I>                    QueryManager PrepareScope in 16ms.
07.13 12:36:41+    0 [446C] 10  M>                     > 20 CIMQueryCache::Update  'V:\develop\IMatch5\src\IMEngine\IMQueryCache.cpp(179)'
07.13 12:36:41+    0 [446C] 10  I>                       Query Cache Memory used: 0 KB for 786 entries and 7 tags. Scope size: 786
07.13 12:36:41+    0 [446C] 10  M>                     < 20 CIMQueryCache::Update
07.13 12:36:41+    0 [446C] 50  I>                    QueryManager PrepareScope in 0ms.
 

I forced IMatch to close.

Started IMatch again with my main database and did the following workflow:

1) Open Imatch
2) Switch to Collections Tab and choose BOOKMARKS  (123 photos are shown)
3) Select all and drag/drop onto Export Panel ( Imatch Batch Processor)
4) Configure a file rename, all metadata checked and hit OK

Imatch becomes totally unresponsive as it exports the 123 photos... it took about 60 seconds and then the dialog asking about pending metadata writes came up.. I said to process the pending metadata writes, the program went unresponsive again for 45 seconds.. and then I received the output complete dialog box.

I watched and all 123 images exported to disk rather quickly... it seems something else in the system is causing issues/processing.  Viewing the debug file, I once again see lots of Collection result fetches even though all of them are collapsed... still don't understand why the system is recalculating these all the time or why they are taking so long to generate....

Thoughts from the log?  Thx - Andy.

Mario

The query cache is updating because the file window contents have changed, by whatever reason. This is normal and with 16ms also surely not the source of the problem.

12:46:05 The Batch Processor dialog opens
12:46:16 The first image file is loaded
12:47:02 The BP scans the output folder via the engine to determine if the new files must be added to the database or not. They are not added. This takes < 1 second.
This indicates that the entire processing for the 123 images took 46 seconds. Which seems to be OK.

12:47:02: After the output files have been created, the BP checks if and which metadata must be copied (depends on your preset).
Since this is done as a managed copy from the source image to the target via ExifTool, the BP notifies you if source images has pending write-backs. 10 files affected.
12:47:19: IMatch now writes back the source files. This takes ~ 7 seconds.
The entire Batch Processor task takes 83 seconds, including about 10 wait time for the user closing the Pending WB dialog.

12:48:27 IMatch starts to index new files found in a totally different folder 'V:\Family_Photos\...

Nothing unusual or taking very long, as far as I can tell.

You work with files with drive letters like Q: or V:
Are these external drives? Network drives? This can slow down processing.

Your database is quite small, less than 100K files.
You should not get any warnings about slow categories or anything. Not for such a small collection and on a computer with 4 processors.
Maybe the Q: or V: drive cause slow downs? File system operations can take seconds on a NAS while taking only ms on a local SSD.
Virus Checker scanning the database all the time perhaps? Did you configure the database folder (!) and IMatch as an exception?

Jingo

Thx Mario... I suppose I'm just being impatient  - I agree that it doesn't seem to take long... it just concerned me because the first time through, the system was frozen and the log shows hundreds of those same cache messages.   8)

I'll just continue on my way.... All my data drives (V/Q) are local 7200 spin disks... except for the DB which is on SSD as is my boot/program file drive.

Thx for reviewing... Andy.