Performance/Speed for file operations with images

Started by WebEngel, July 21, 2023, 08:53:53 PM

Previous topic - Next topic

WebEngel

I wanted to discuss the speed of Imatch for file operations.

In general, renaming and moving images is very slow.  Moving or renaming movies is much faster.  Moving an entire folder with pics is also very fast.

Here are a couple of measurements

Imatch 2023
Moving 75 images, all of them with JPG Buddies, some with XMP buddies.  None have categories.  5:00
Renaming 153 images, all of them with JPG buddies, none with XMP, no categories (except data-driven), small metadata changes: 13 min (with debug logging)
Task manager during rename for Imatch: CPU was at 28% -- 33%, Memory at 230 MB, disk at 0,1 MB/s, network idle

Imatch 2021
Renaming 162 images: 18:18, of which 30s for "resolving duplicates"
Moving 14 images within one partition: 31s (first batch), 26s (2nd)
Renaming 62 images: 5:05 (first estimate was 36 minutes)
Moving 12 images within one partition: 23s
The HD LED only occasionally flashes

Is what I am seeing normal?  Or is there some kind of a system issue for me?  I do understand that file operations in a database environment with tons of references is slower than in a simple file system.

Regards
martin

thrinn

That is definitely not what I see. Just moved ~500 pictures (JPG + RAW) from one folder to another. Took about 20 sec.
You say that moving or renaming movies is much faster. But movies are in general much bigger in terms of file sizes, so this at least very suspicious.

One thing I see in your log file are some lines like this:
07.21 20:31:20+17250 [39A4] 50  I>      #FindRelated: 75 masters, 18 definitions, 72 links found. 5003253 files analyzed.Mario can surely better explain the exact meaning of this log entries, but my interpretation is that IMatch has to check 5 million files for versions - but we are talking about only 75 masters!

So, I would first check your File Relation rules. Maybe they are not specific enough (in terms of where to search for versions). And you have apparently 18 different rules - which is quite a lot. You could try to repeat your test (in a test DB!), but disabling the File Relation rules first. That could give you a hint on what is going on.
Thorsten
Win 10 / 64, IMatch 2018, IMA

WebEngel

Thanks, Thorsten

Well, movies may be bigger, but the file size does not make a huge difference when moving between folders within the same partition.  At least from Windows file explorer.

Good idea with the file relations.  However, did not improve things for me.  I disabled almost all file relations and tested again.  Same speed, 1:21 for 32 files (16 Raw plus Jpg).

Martin

Mario

The log file shows a lot of stuff, not just copying/moving files. The log file covers about 3 days, from the 16. to the 21. of July. A lot of data for me to wade through.

The first thing I noticed in your log is a warning that the system memory is utilized to 83%. This is not really good. Your system has only 8 GB of RAM (the minimum considered for Windows) and only about one GB is available. At this point in time, IMatch is using only about 450 MB of RAM.

I see a lot of write-backs being performed, a lot of propagation from masters to versions being performed.
Metadata is changed, write-backs are performed.
Write-backs take 1-2 second per file on the E: disk.

QuoteRenaming 162 images: 18:18, of which 30s for "resolving duplicates"
This step checks in-memory for duplicate file names  being created (sub-second for only ~200 files) and on-disk if there are name collisions. If this takes 18 seconds, either the disk is VERY slow or something is interfering...?

Which virus checker do you use?


QuoteRenaming 153 images... 13 min 
Ridiculous.

I see a rename operation.
This task performs some checks, e.g. if the file already exists in the file system, if there are buddy files to deal with and if these already exist in the file system etc.

Then it renames the file in the file system and if this is successful, it performs the rename in the database.
In your log this looks like this:

A) 07.21 20:17:58 CIMEngine5::FileOpRenameFile
B) 07.21 20:18:30    CIMEngine5::RenameFile  'V:\develop\IMatch5\src\IMEngine\IMEngine5.cpp(7216)'

Between 20:17:58 and 20.18:30 is a 32 second gap where IMatch does nothing.
Between A) and B), the only operation that may take measurable time is the check for existing files in the file system and the actual rename operation. A) does the checks and calls Windows to rename the file, B) is executed when the file system rename operation completes.

If it takes ~32 seconds to rename a single file in the file system, it explains why everything is so slow.

The next FileOpRenameFile  takes 5 seconds, then 5 seconds, then 6 seconds etc. All very slow.

I renamed a set of 200 files on my PC (files on spinning disk, master files and versions mixed) and IMatch did not even log a timing for the FileOpRenameFile task - which means it was took only a few milliseconds.

95+% of the time spend renaming a file is spent "in" the file system, the rest is database updates.
File relations with excessively large sets of potential versions to check after a file was renamed (which could make the file a master or a version...) may slow things down.

I also see a lot of "slow trans" logs, which means that parts of IMatch had to wait for other parts to finish performing database operations. This is usually an indicator of the database is not performing super-fast. But I see this usually only with very large databases (say, 500,000+ files) or with databases stored on spinning disks. This is not much of a problem anymore with modern SSD storage.

Is the database on your fastest SSD?

I see that ExifTool also struggles a bit with reading data. It takes almost 8 seconds to extract metadata from 5 files in the folder E:\Fotoalbum\Raw_Master\2023\04\. IMatch reads metadata with multiple ExifTool instances in parallel, but if it takes 8 seconds to load metadata from 5 files (and 0.4 seconds to process all the data into the database), things will be slow.

It also seems to be quite slow to update data-driven categories. For example, it takes a whooping 11 seconds to update Make and Model. And that's just one of the data-driven categories. And updating a data-driven category will "saturate" the database, making other things works.

All this looks to me as if the system is not performing too well. Everything that is related to disk I/O (database, renaming files) is unusual slow.

Maybe the virus checker interfering with IMatch?
Which virus checker do you use?
Did you configure an exception for previous IMatch versions?
If so, did you update it to the new executable file name IMatch 2023 uses?

See the first TODO here: https://www.photools.com/help/imatch/migrate_imatch2023.htm






WebEngel

Hi Mario

Thanks for the analysis effort.

Imatch is installed on a SSD; the DB is on the same SSD, and the image files are on a magnetic disc (the faster "conventional" recording CMR, 50% free space).

Renaming or moving movie files in Imatch is fast.  Renaming image files with some other rename tool is fast.  Moving files with Windows File Explorer within one partition is fast.  For all of these operations, the disc does not need to do a lot of work, as only a few bits are changed (Windows does not move files but change the address pointer).

I also did some file copy tests on the magnetic disc some time ago, not a benchmark, but a real-world test to copy huge files, and the disc performed to what it should.

Remember, the hard disk is idle during the move; it hardly ever flashes and is below 1 MB/s. Actually the entire PC is idle except for the CPU which is at 30%.

I am going to get a new mainboard with 16 GB, faster video and faster CPU next year.  I doubt this will improve Imatch performance significantly, as all components except the CPU are idle already now, and the one CPU will only be 20% faster or so.

I am using Windows 10 Virus protection.

Imatch executable, Imatch DB, *ARW, *JPG are and have been excluded from virus scan, for at least the last 3 Imatch versions.

I have temporarily switched off real-time protection and repeated the test.  No difference.

I have freed up memory to go to 53% and repeated the test.  No difference.

martin

I also did both at the same, virus off and 53% memory, again no difference.


Mario


QuoteRenaming or moving movie files in Imatch is fast.  
Where is the difference to your files then? The metadata for movies is the "same", versions?, IMatch uses the same Windows functions to rename or move all files, ...

QuoteRenaming image files with some other rename tool is fast.
 Moving files with Windows File Explorer within one partition is fast. 
IMatch needs to do a lot more work than simply renaming a file in the file system.
For moving and copying files, it uses the same routines Windows Explorer uses.

Quote(...) Windows does not move files but change the address pointer.
I know.

QuoteImatch executable, Imatch DB, *ARW, *JPG are and have been excluded from virus scan, for at least the last 3 Imatch versions.
Note that IMatch 2023 has a different executable file name.

Like often with such problems that happen only on one computer, it is very hard to identify the source of the problem.
Please switch IMatch to debug logging.
Move 10 files from one folder to another on the same disk (I believe this is what you do?)
Use Help menu > Copy LogFile to copy the log file and attach it.
The other log file, spanning 3 days, has just too much noise in it.
I'm interested in the log file entries starting and ending with CIMTaskFile::MoveFiles

On my PC, with 10 RAW files, this looks like this.

>  0 CIMTaskFile::MoveFiles
...
<  0 [235ms] CIMTaskFile::MoveFiles

The entire operation takes 0.25 seconds.
Let's see what is logged on your system, if there are "gaps" where IMatch is not logging anything etc.


WebEngel

#6
Hi Mario
was on vacation for a week, hence the delay.
Yes, I know that Imatch does much more than File Explorer during rename and move.
Yes, I know the Imatch 2023 has a new file name; it has an exception in Windows Virus Protection.

OK, did the test.
The log file contains only opening Imatch and then moving the files.
10 ARW files moved, along with 10 Jpgs.
Real-time virus scanning was disabled
Task Manager: CPU was at 28% during the operation and 10s after.  HD was alway below 5%, except after a short peak at the very beginning.  Memory was always around 65%.

Martin

Mario

The first thing I see is that IMatch loads your database in about 7 seconds
13:38:27: Database load complete in about 7 seconds
13:38:28: Geo locations loaded
.... PAUSE ?
13:38:45 The engine reports one or more offline folders
... PAUSE?
13:39:00: init instance completed, after a whooping 45 (!) seconds.

The typical time interval between IMatch reporting the database load complete and init instance completed is one second.
On your system, is is 33 seconds. Unusual.

I see an unusual number of TranBegin wait loop entries.
This is logged when a part of IMatch tries to open database transaction but cannot, because another part of IMatch is currently exclusively working with the database.
This can occasionally happen, but not that often for a database with only about 120,000 files.
It happens very often while IMatch is moving files. Moving a file requires some database updates, and every time IMatch tries to update the file path in the database, I see two or more TranBegin wait loop, each with about 1000ms of wait time.

This might explain why your file move operation takes so long.
The time for moving the file in the file system is logged as 0, so it is very fast.
But then IMatch has to wait 2, 3, 4 or more seconds for the database to become available so it can finish the move operation by updating some photools.com tags (file path).

Your move operation takes 70 (!) seconds. And virtually 100% of the time is spent waiting for the database.
Usually there are no TranBegin loops logs during file moves.
IMatch moves a file, if successful, updates the database, then moves the next file.

Suggestions:

The database is quite small, but not performing well at all.
The database is on D: Is this your fastest SSD?
This looks like a on-access virus checker is scanning the database every time IMatch tries to write to it, making IMatch block for 2 to 4 seconds before it can access the database.

Check for off-line folders, especially on networks or external disks.
Sometimes Windows can take several seconds for figuring out if a folder is off-line, while making IMatch wait. This can slow down certain operations.


WebEngel

Thanks for analyzing, Mario.

I am using Microsoft Defender. For the last test today, Real-time scanning was disabled.  Exclusions from Defender see attachment.

The DB is on an SSD.  Samsung 850 EVO 250GB

The Winsat Benchmark looks as follows:
C:\WINDOWS\system32>winsat disk -ran -write -drive d:
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-ran -write -drive d:'
> Run Time 00:00:00.84
> Dshow Video Encode Time                      0.00000 s
> Dshow Video Decode Time                      0.00000 s
> Media Foundation Decode Time                 0.00000 s
> Disk  Random 16.0 Write                      369.29 MB/s
> Total Run Time 00:00:01.09

C:\WINDOWS\system32>winsat disk -ran -write -drive e:
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-ran -write -drive e:'
> Run Time 00:00:11.75
> Dshow Video Encode Time                      0.00000 s
> Dshow Video Decode Time                      0.00000 s
> Media Foundation Decode Time                 0.00000 s
> Disk  Random 16.0 Write                      2.27 MB/s
> Total Run Time 00:00:11.81
D: is my SSD for the Imatch DB and for the Imatch executable.  E: is my magnetic disc for the images and video files.

Regards
martin

Mario

Disk benchmarks don't matter.
No attachment.
Did you make a new exclusion for IMatch 2023, since the name of the executable has changed?

graham1

I find that generally IMatch 2023 has become faster.  I particularly like the fact that if some (but not all) files in a scope have keyword changes, even though all files are selected, only those with actual changes are updated and need to have the metadata and file saved - this is such a time saver for me, compared to IMatch 2021 in which all the files selected were treated as having been changed and needed to be saved whether or not their metadata had actually changed.

Seeing the reference in this thread to IMatch logs which refer to a (notional) V: drive leads me to ask whether or not the presence of an actual V: drive makes any difference?  My permanent internal hard disks in my computer take me up to M:  If I have my USB external backup disks attached, these are configured as T: to Y:, meaning that sometimes (actually quite often) I have an actual V: drive attached.  Would this in any way have the possibility of confusing IMatch?  I have not noticed that it makes any difference whether or not the V: drive is attached, but I have not tested it rigorously.

Graham

Mario


QuoteWould this in any way have the possibility of confusing IMatch? 

No. V: is just the virtual drive I use for development. When IMatch logs the name of a source code file, it will always use the name on which the file is stored on my system. This is all fixed when I compile IMatch on my computer.
If the user has a V: drive or not is irrelevant for the logging.

sinus

Quote from: graham1 on August 01, 2023, 11:13:25 AMI find that generally IMatch 2023 has become faster.  I particularly like the fact that if some (but not all) files in a scope have keyword changes, even though all files are selected, only those with actual changes are updated and need to have the metadata and file saved - this is such a time saver for me, compared to IMatch 2021 in which all the files selected were treated as having been changed and needed to be saved whether or not their metadata had actually changed.

...

Graham
Hmm.
Maybe I am totally wrong, but I am quite sure, this was also the case in previous versions. I wonder, but I can be wrong, of course. :)
Best wishes from Switzerland! :-)
Markus

WebEngel

OK, sorry, indeed forgot the attachment.  Anyway, that allowed me to get rid of the "offline folders".

So here is a new upload file.

And again the facts:
Moving images within one partition is slow
moving movies is fast
Files are on magnetic disc, Imatch DB is on SSD
Virus exceptions should be correct (see screenshot)
Switching Microsoft's Defender's "Real-time protection" on or off does not matter
Freeing up memory does not matter
Getting rid of most file relation definitions did not improve things

Mario

Same results as before (see my long explanation above).
IMatch is spending most of the time waiting on the database.

IMatch reports a couple of times that it takes unusually long to update the collection "Collections\Label".
Not other collections, just this one. This is mostly an "in-database" operation.

Something is not happy on your system. Everything that needs database access is slow. No idea why or what might keep the disk so busy or the database response times so slow.

Try to reduce the number of processor threads IMatch uses for background processing.
Edit > Preferences > Application.
Search for "Process Control" and set all four options to 2 (the default is 0, which means auto).
Close the dialog with OK and restart IMatch.

The system reports 4 processor cores and 8GB RAM. Over 70% of the RAM are already in use before IMatch starts.
Although IMatch only uses 500MB RAM in peak, a system with such little free memory may negatively impact IMatch and Windows.