File Updates = Long Time!

Started by Darius1968, June 08, 2021, 08:13:34 AM

Previous topic - Next topic

Darius1968

I just, a little bit ago, added some text into the "Usage Terms" Field, for about 35,000 files.  It took IMatch only about 6 min. to write the metadata back to my files, on my SSD.  However, the aftermath - updating the data, internally in the database, ie. calculations for the Timeline Nodes, etc. - is taking somewhat longer.  I mean, it seems to have taken 10 min. to handle around 5,000 files.  Is this normal?  The database is basically locked while this is happening, and I can't do anything. 

Mario

As always: Include a log file in debug mode (which contains detailed timing information), the spec of the computer you are using (Slower notebook with 4 cores or workstation PC with 24 cores) etc.
Descriptions like "Takes 10 minutes" or "slow" are always subjective.

The log file shows if your database has 100,000 files or a million, which operation took how long, if there are categories which pull down performance etc.
And always make sure that your virus checker is not scanning the database on every access (make the database folder and maybe IMatch an exception). Virus checkers are now the culprit in 90% of all "IMatch is slow" cases.

Darius1968

I'll surely include a log file for you, once the computations end, and I once again, can navigate IMatch!  As far as PC specs:  Windows 10/64, I9 Processor, 16 GB RAM, on a 2 TB SSD. 

Mario

You can copy the log file in Windows Explorer with Ctrl+C, Ctrl+V, even while IMatch is running. This allows you to take snapshots of the log file as needed.

Darius1968

Okay.  Then, from which directory do I get it, and what is the exact file name? 

Darius1968

This is the complete log file, after IMatch completed the operation, in about an hour. 

Mario

The database has about 500,000 files. Many image agencies get by with far less files.
IMatch can handle it, but of course everything that takes 1 second on a 100,000 files database will take at least 5 seconds on a 500,000 files database.

IMatch reports slow execution times, and very frequently, in the Filter Panel.
Did you have one or more filters enabled while IMatch was reading and writing 35,000 files?
Which filters were visible and which were enabled?

I see many repeated queries for 'NOT .jpg' with 45,133 results, from a metadata value filter?
The query time is 10 seconds (very slow!), probably caused by high database utilization due to the 35,000 imports running in the background.
And, each new imported file must invalidate all query caches in memory, causing IMatch to reload the potentially modified data again in from the database.
This getting in the way of the import, reducing database performance.

I also see that IMatch had to recreate some cache images. Nothing unusual.

The write-back is not performing all too well.
Your system reports 16 cores. Maybe you can improve performance by forcing IMatch to use less (say, 4) threads for write-back. Process Control (Advanced Setting)

Especially when you write-back while filters are active.
Also, when you have the Category Panel open (or the Category Filter) or you show categories in your thumbnails, IMatch is forced to recalculate all categories after each imported batch of files (5 - 20).
When writing and imported 35,000 files at once, this can become quite a drag.

So, deactivate the filter.
Maybe close the Category Panel / collapse the Category Filter if loaded.
Reduce the threads used for import and write-back to, say, 4 to 8 and then repeat the test with a few thousand files to see if this changes anything.


sinus

Quote from: Darius1968 on June 08, 2021, 08:13:34 AM
...I mean, it seems to have taken 10 min. to handle around 5,000 files.  Is this normal? 
The database is basically locked while this is happening, and I can't do anything.
...

For me 10 min for 5000 files sound normal. But as Mario mentioned, that is of course subjective.
I mean, maybe you will do such things for 5000 files not every day?
You can not do anything in this case with the DB, I agree, but you could relax in this time and drink a fine coffee or tea.  ;D (Of course, this is not serious, Darius, came simply in my mind  8))
Best wishes from Switzerland! :-)
Markus

Jingo

As I too have faced this same issue, I wonder if IMatch can be enhanced to "recognize" some of these "long time" scenarios and either a) warn the user  or b) diagnose, warn the user and temporarily suspend/close some of them.

For example, if a particular filter will cause an extra 25 minutes of processing time because it is active, perhaps it could be turned off automatically during the write-back/re-read process if more than "X" images are involved?  Same for a particular panel is open such as categories... simply close the panel.

Perhaps this just isn't possible or feasible but I wonder if this will make this more efficient during very "expensive" operations like updating more than 10,000 files?  Just an idea (perhaps a very bad one)  8)

Mario

#9
The OP let IMatch write back 35,000 files in a single batch. Videos, PNG files, all kinds of files.
Write-back time is about 0.2s per file, and IMatch writes multiple files in parallel.

Optimizations as you suggest are already in place.
But IMatch cannot show invalid / false information just because it is writing back files in the background.
False information could lead the user to do stupid things or could even cause data loss.

So, depending on how you have configured IMatch, how many panels you have open, if your have active Filters, search bar active, category panel open, maybe show variables based on categories or collections in the File Window - there will always be a specific configuration and use-case that causes less than optional performance - especially if you keep working while IMatch has to write-back 35,000 files (and reload them afterwards) on a database containing 500,000 files. Which is considered corporate-grade by many IMatch competitors.
There are limits, even for IMatch. Physics and stuff.

IMatch cannot know in advance if a user plans to write-back 35K files in one go. Or what the user will do in the UI while this is running.
Impossible to think about everything users can do in advance. And come up with plans how to deal with it - without ruining the user experience or performance for all other users.

A 16 core processor as used here is also pretty rare. And may mess up things because IMatch may try to do to many things in parallel - and then the CPU gets too hot and throttles down.
Or, maybe just the virus checker kicks in, starting to scan the database on every access by IMatch (thousands per second) or scanning every modified file - which messes up ingest performance.
I have also seen virus checkers terminating ExifTool, which causes IMatch to wait for some time and then re-starting the instances - over and over.

The log file posted by the OP shows IMatch performing well, happily churning through write-backs and ingests. A filter is active which causes 10s (!) of database activity every time it is updated. This is most likely the cause.

Also, keep in mind that most write-backs performed by user contain a few to a few dozen files (Telemetry). Not 35,000 write backs for a 500,000 files database.

Jingo

I understand... and perhaps there isn't anything that can be done.

I've had this happen to me and when we analyze the log files, find a similar trend... "expensive" panels that are "open" (not necessarily used - just active in some manner) that causes additional work for IM.  I'd be happy to completely disengage panels/close active filters, etc before updating the DB if I understood how it would negatively impact the system.  As an end-user, we often just don't know until it is too late and the write-back is already occurring.

But - as you mention - perhaps long write-backs are rare.  All good.. just "thinking out loud".   :)

Carlo Didier

Maybe a solution for the user would be to create a very simple (i.e. no "expensive" panels open) workspace which he can then select before doing large scale updates.

ubacher

Or a button "Suspend any interactive activity" until updates done.

Mario

When you write back more than a certain amount of files, IMatch blocks the user interface with a "Writing back metadata" dialog and disables almost everything.