Parallel metadata write back speed

Started by Aubrey, November 05, 2018, 07:45:12 AM

Previous topic - Next topic

Aubrey

I'm currently adding metadata for many files I have digitized.

Initially I thought that the parallelization of metadata writeback was faster, now I'm starting to wonder if this is the case on my machine. My original tests were made with 10 Mb CR2 (Canon raw) and smaller JPG files

When I change metadata in TIF files size of 100 Mb writeback appears to take a long time. The task manager shows that the disk is working at 100%, so it clearly cannot writeback faster.

I wonder if writing back many files simultaneously means that the write head of the disk is moving from file to file as it tries to write all back. I therefore wonder if serial writeback would be faster where one write completes and the next starts.

Is there a switch to select serial writeback rather than parallel, or perhaps a switch to limit number of files writing back simultaneously.

Aubrey.

Mario

There is usually a lot of I/O idle time that can be used my processing files in parallel. There is also in-memory calculation and processing tags which can be parallelized.
If your files are on external disks or network/NAS storage, even more can be gained.

If your disk goes to 100% while writing a 100 MB TIFF file this is good. This means that IMatch works as fast as possible.
But writing 100 MB should take a few seconds max, even on slow spinning disks.

If the disk is busy, Windows automatically delays parallel write-backs. Also, there is the file system cache, which means that application writes will be cached into memory until the disk becomes available again. Again, this can be maxed out by using more than one write-back.

As explained in the release notes and also in the IMatch help (Process Control (Advanced Setting)) you can disable parallel write-back by setting the corresponding setting to 1, or adjust the number of parallel write-back threads to your machine by dialing in a number. This works the same also for the read and metadata import threads. The defaults usually work best but you can make experiments with these settings.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Aubrey

Mario,
Thanks, I had forgotten one could select the number of threads.  :(
The disk is a WD 3Tb 7200 rpm disk attached to the machine.
I have 2 E5-2687W processors each has 8 cores and 16 threads. So I suspect that too many files are being processed in parallel for he disk to handle. I'll reduce the number of parallel writebacks.

Thanks,
Aubrey.

Mario

IMatch limits the threads to a max of 16 or 12 anyway, independent of cores and hyperthreading. Of course when you let lose 16 cores on a poor old spinning disks it will be maxed out. For a system with that many cores several TB of SSD storage is more appropriate  ;)
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Aubrey

I'm leaving the parallel advanced setting at default - works best for me.

In principle, should it take much much longer to write back metadata (dates) to a large file (tiff 100 Mb) than to a NEF file 20 Mb? It appears to on my machine.

It's not a big deal - I just complete all changes and then go away for lunch while files are updating. I'm trying to understand long updating of recently digitized images.

Mario

This all depends.
ExifTool may need to splice the file to make room for new metadata. It also depends on whether the file has legacy IPTC data and/or EXIF data, because then ExifTool has to perform a lot more work. Since ExifTool does not perform "inline" updates but always writes the file as a stream (which is slower but a lot safer) the absolute file size matters.

I've made a quick test with a 200 MB TIFF file on a SSD (with legacy IPTC and EXIF and GPS and embedded XMP of course) and the write-back takes about 4 seconds. Including Metadata synch between IPTC/EXIF/GPS and XMP.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Aubrey

Quote from: Mario on November 06, 2018, 10:23:51 AM
I've made a quick test with a 200 MB TIFF file on a SSD (with legacy IPTC and EXIF and GPS and embedded XMP of course) and the write-back takes about 4 seconds. Including Metadata synch between IPTC/EXIF/GPS and XMP.

Thank you for your test.

I moved some files to my 256K SSD (also my system disk). I get a few seconds of writeback time for simple XMP data.

I can see the benefit of SSD for changing metadata for large files.

Aubrey.

Mario

Fast SSDs are just pure gold. Thankfully, the prices are dropping. A good Samsung EVO 860 1 TB SSD is now at only 160€  :D
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Aubrey

Mario,
I think that there is an issue with speed.
I changed 2 metadata on 2 images and wrote back using default settings. Went fine relatively quickly (not to SSD)

Then I changed the application setting to 1 for each of the three parameters. Write back took some 3 minutes, and completed with a yellow triangle. Later I did a normal rescan and triangle cleared.

I've attached log file - its not in debug mode though.

On closing IMatch and reopening, I then applied a writeback (still with application settings set to 1) and the writeback went relatively fast.

It appears to me that when one changes number of processes IMatch must be closed and reopened to apply the change correctly.



Mario

Yes. Don't play with these parameters unless you have a real need. Forcing IMatch to run single threaded for metadata import and indexing will severely affect performance. The minimum should be two at least. But, best to keep these to defaults.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Aubrey

My point is that when a change is made to "Threads for file Import", "Threads for metadata input" and "threads for metadata writeback" it is important to shutdown and then restart IMatch.

There ought to be some message in the application window warning the user to do this. Perhaps in the message box at the bottom of the preferences|application tab.

Aubrey.

Mario

#11
I cannot thing of everything, sorry.
A quick look at the code shows that the changes are applied immediately.
Please file a bug report. I doubt that more than 1% of all users will ever tinker with these settings.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Aubrey

#12
Quote from: Mario on November 06, 2018, 08:20:36 PM
I cannot thing of everything, sorry.

Mario,
I was certainly not disparaging your excellent work. The enhancement is working well for me with some tweaks on my machine.

As you indicate, most people will not have some 16 nodes on their machines. (My machine was originally bought for oil reservoir simulation and a lot of that work is parallelized).

I'll file a bug report, but it's not a bug, but simply an enhancement that I'm suggesting.

Thank you,
Aubrey.


Mario

Then file a feature request so this does not get forgotten.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook