YAY! Write-back Performance now 2 to 5 times faster!

Started by Mario, October 21, 2018, 09:15:36 PM

Previous topic - Next topic

Mario

Over the past weeks, I have refactored some parts of the IMatch 'Engine' (the core of the IMatch system) in preparation for IMatch AnywhereTM and the new IMatch WebServices Generation.

As a planned side effect, these changes opened up the opportunity to parallelize certain tasks in the engine, e.g. certain paths in the write-back.
Parallelizing means that IMatch, where possible, tries to split up big tasks into smaller sections, and then process them at the simultaneously, to better utilize modern multi-core processors. This often results in noticeable speed improvements.

Write-back was so far single-thread because it is super-duper complicated. So many options are involved, user settings, per-file format configurations etc. And most of the time is spent inside ExifTool, which does all the hard work. I cannot make ExifTool faster. ExifTool already performs superbly, but when you write back 100 files, there is a lot of work to do. Also, propagation of metadata often causes updates to many files, including re-import of changed metadata afterwards. A small price for superior metadata quality, but nevertheless...

To break this up and make it able to run concurrently was rather complicated. But I hoped that it would be worth it And it is!

On my PC, with a 6 core i7 I have experienced a 10 times better performance for the simplest test case: Writing back 8,000 JPEG files after modifying rating and label took almost 10 minutes before, now it takes 1 (one) minute!

The JPEGs in this test don't contain EXIF/IPTC data so only the XMP record needs to be written. This is much faster than reconciling between XMP/EXIF/IPTC of course.
But I also get a 2 to 5 times better performance for processing regular RAW files, with embedded EXIF/GPS data and XMP in sidecar files.

The effective performance gain depends on the number of processor cores in your system, and how 'saturated' the disk is during write-back. If a disk is utilized 100% already, IMatch cannot speed things up further because it has to wait for the disk to finish. The results will be more pronounced for regular disks or network storage than for SSD storage.

I'm still in the final tests but results look very promising   :)
No guarantee though. I hope this will make it into the shipped product but it may take some time.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Kucera

Congrats! And thanks, never too fast or too much memory, can't wait :)

Mario

Quote from: Kucera on October 21, 2018, 09:28:27 PM
Congrats! And thanks, never too fast or too much memory, can't wait :)

This is still in my "experiments" branch of IMatch. I cannot make promises if and when this will become publicly available. But I hope this will make it.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mastodon

Super!!! I do metadata write-back often, I like to have all info in my files. (I know it is not very wise, but I too old-fashioned.)

Jingo

Quote from: mastodon on October 21, 2018, 10:16:40 PM
Super!!! I do metadata write-back often, I like to have all info in my files. (I know it is not very wise, but I too old-fashioned.)

Me as well... I know the benefits of using a database for speed.. but since I use many many different programs across platforms - having the metadata with the files (and in my backup files for futureproofing) is important... any speed bumps are always welcome!!

Mario

The general idea should always be to keep your metadata inside your images. This makes your files self-contained and you independent from whatever imaging or DAM software you use.
This allows you to keep control over your files and data. As it should be.

I'm aware that this is not really popular among software vendors. Because, locking customers into some sort of "cloud offering" or making them dependent on server-side technologies is currently the #1 goal for marketing and product management everywhere.

I'm no fan of that.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

sinus

Quote from: Mario on October 22, 2018, 12:07:14 AM
The general idea should always be to keep your metadata inside your images. This makes your files self-contained and you independent from whatever imaging or DAM software you use.
This allows you to keep control over your files and data. As it should be.

I want stress this, I fully agree.
Best wishes from Switzerland! :-)
Markus

Mees Dekker

Here too: full agreement. And when you need to send your files to some place without all the key-words, rating etc etc, it is easy to remove them by using a metadata template.

ColinIM

Quote from: sinus on October 22, 2018, 08:07:07 AM
Quote from: Mario on October 22, 2018, 12:07:14 AM
The general idea should always be to keep your metadata inside your images. This makes your files self-contained and you independent from whatever imaging or DAM software you use.
This allows you to keep control over your files and data. As it should be.

I want stress this, I fully agree.
I also agree Markus.

As mentioned also by others (Jingo and mastodon), I too make frequent changes to the metadata stored in my files. In fact I spend about as much time revising old metadata as I do in adding new metadata, and I make frequent Writebacks as I work, so this (anticipated) speed improvement will be very welcome.

jch2103

Quote from: ColinIM on October 22, 2018, 07:49:33 PM
As mentioned also by others (Jingo and mastodon), I too make frequent changes to the metadata stored in my files. In fact I spend about as much time revising old metadata as I do in adding new metadata, and I make frequent Writebacks as I work, so this (anticipated) speed improvement will be very welcome.

+1 Any speedup in writing of new/revised metadata would be great.
John

BanjoTom

This will certainly be a WELCOME improvement to an already wonderful program!   :)
— Tom, in Lexington, Kentucky, USA

sinus

Quote from: ColinIM on October 22, 2018, 07:49:33 PM
Quote from: sinus on October 22, 2018, 08:07:07 AM
Quote from: Mario on October 22, 2018, 12:07:14 AM
The general idea should always be to keep your metadata inside your images. This makes your files self-contained and you independent from whatever imaging or DAM software you use.
This allows you to keep control over your files and data. As it should be.

I want stress this, I fully agree.
I also agree Markus.

As mentioned also by others (Jingo and mastodon), I too make frequent changes to the metadata stored in my files. In fact I spend about as much time revising old metadata as I do in adding new metadata, and I make frequent Writebacks as I work, so this (anticipated) speed improvement will be very welcome.

Exactly what I do, Colin.
I do edit some old images from time to time, when I have e.g. more information like Location.
Includes some scanned images from decades back.

New images I try to describe correct, but new technologies give us often new possibilities, but also more work.  ;D (like e.g. destination gps).
Best wishes from Switzerland! :-)
Markus

ben

+1
That would be great.
I often write back lot's of metadata.
;D