Writing back Metadata in 3 phases.

Started by scw2wi, March 13, 2016, 10:10:49 AM

Previous topic - Next topic

scw2wi

I have made the observation, that when I write back changed Metadata to Images, IMatch does this in 3 steps.

1.) Writing back the metadata.
2.) Rescanning the changed images.
3.) Recalculating the changed categories.

Depending on the number of changed images, the first 2 steps can also take hours, the last step is faster.
If my observation is correct, IMatch starts with step 1, if this is finished step 2 is done automatically,
if this is finished IMatch stops and goes in idle mode.

If I then make any action in IMatch (e.g. navigation to another context) IMatch starts with step 3
and if this step takes e.g. 5 or 20 min, I have to wait. (I run step 1 and 2 usually over night)

Is my observation correct or do I have a wrong configuration?
If it's correct, would it be possible to start with step 3 automatically when step 2 is finished.
This would speed up my workflow significantly.

Walter

Mario

A question like this never came up, because re-calculating categories takes seconds, not minutes.
And it is performed in the background, when no metadata operations take place. There is a lot more going on that just calculating categories in the background...

Please provide log file, specifics about what you do, how many data-driven categories you have etc.
The log file will show us all that, and which operations take long.

Nothing should take hours. And recalculating a data-driven category even for a 200,000+ files database takes a few seconds or maybe even 10 or 20 if the database is huge and the disk is slow. The log file will tell.

Please switch to debug logging under Help > Support before performing the operations which are slow.


scw2wi

Here is my testscenario with logfile in debug mode.
Since the hole logfile is about 6 MB zipped I had to delete some lines (21:15-23:00 & 23:00-00:30) that are hopefully not having significant more information.

I have about 2x 80.000 images in my database and changed about 5000 of them for writeback to NEF & JPEGs.
All drives are local but not the fastest ones, Ressourcemonitor says about 5 MB/s.
I know that this sounds very slow but it's sufficient usually, maybe I am looking at the wrong column.

Writeback runs via a script ("Writeback Metadata" by JohnZeman) which could make it a little bit slower, but that's not the issue.

Phase 1: 21:04 - 23:15 (~2h) writing back all changes in NEFs and sync metadata with JPEG Versionfiles.

I know that this is approx. 10x slower than it was with Nikon ViewNX, but ExifTool is doing much more than ViewNX did.
Since I usually run writeback over night I do not care if it takes 10 min, 1h or even more.

Phase 2: 23:15 - 00:43 (~1,5h) re-reading changed metadata

Also this is slow, but again I do not care since it starts immediately and is finished till I wake up next morning.

Phase 3: 06:53 - 07:00 (7 min) Actualising Categories

Altough this is the fastest phase of all I would propose to start this phase immediately when phase 2 is finished.
I see no reason to wait for the first user action.

What I observed during this phase 3 is sometimes a white line below the IMatch status line and sometimes even a complete white screen.
In both cases Ressourcemonitor marks it red and says: Application not responding.
I am woundering if this only happens on my computer because no other application is showing such a behavior.

Walter


[attachment deleted by admin]

Mario

5 MB/sec is surely the current utilization of your disk, not the maximum transfer rate. A normal hard disk can write about 100 MB/sec and a modern SSD 500 MB/sec.

ExifTool needs rather 30 seconds to load data for 20 files. That's on the low end, even for NEF files. It pulls 300 - 400 (!) tags from each of your files. Do you use the default metadata file format settings (Edit > Preferences > Metadata 2: Tag Manager) or have you enabled additional Nikon metadata?

Since you use a script to write-back, is this script properly designed for performance? I mean does it use transactions to speed up the database etc.? This can make a huge difference.

Did you compare the native write-back time of IMatch with the write-back time of the script?

During import, I see many locks on the database. The background processes responsible for ingesting metadata compete with the user interface over the database. The user interface tries to update collections, which is really slow on your system. This slows down the import process in the background as well, even causing 30s timeouts. Very unusual.

Updating data-driven categories is also realy, really slow on your system. Again this causes locks and unnecessary delays for other running operations, from category updates to collection updates to background processing and file imports. Not good.

How many data-driven categories do you have? Do you need to update them all all the time? Maybe setting some of them to manual update will improve things a lot. Or maybe remove data-driven categories you don't need anymore.

I assume you run a database optimize regularly? Your database is only 150K files and should be much faster.  Maybe it's the hard disk or some other software (virus checker!) slowing everything down. Have you added an exception for the folder containing the database in your virus checker? That's important.

I would also consider spending 80$ for a 256 GB SSD. The performance gain is truly massive.

scw2wi

Thank you Mario, these are a lot of hints and I will try to follow them.

This is what I understood.

*) ExifTools pulls 300 - 400 tags from each files.

Well, I have added a lot of Nikon specific Tags in Tag Manager:
Nikon, Nikon AFInfo, Nikon AFInfo2, Nikon LensData00, Nikon LensData0204, Nikon VRInfo, NikonCustom SettingsD7000
I could remove other Tag Groups I am not using if this would speed up ExifTool, would be worth checking it.

*) Is the script properly designed for performance?

Not at the moment, but I will add the suggested code now.
In a loop
   Set col = Database.Collections.Get(ctypePendingMetadataWriteBack)
   For Each f In col.Files
I am manipulating XMP::xmp\Label\Label\0 and XMP::photoshop\Urgency\Urgency\0
and writing back this file
   Set data = f.TagData
   data.WriteBack (True)
That's it.

I will add trans.Begin and trans.Commit how it is described in Help File.

I have used the native write-back command of IMatch in the past and it was not significantly faster.

*) How many data-driven categories do you have?

I am using the default ones +4 of my own but that will become more in next time.
I have switched them to manually and will see if there is any difference.
I think the @Keywords categories cannot be switched to manually.

*) I assume you run a database optimize regularly?

Yes I do and I have also added an exception for my virus checker.
I will disable my virus checker for my next performance test and will see if it makes any difference.

I will also consider to find a free slot for an extra SSD.

I hope with all that improvement hints I will find the reason for that many locks.

Could you please consider to start category actualization automatically after write back is finished?

Walter

Mario

QuoteCould you please consider to start category actualization automatically after write back is finished?

The background processing is a very complex thing. Categories are always calculated on-demand, or when the IMatch engine is idle.

scw2wi

In that case I see a real advantage in having categories update set to manual.
If I want to see a category, I can update just that single one in nearly realtime.

At the moment I can set the auto update option to ON and OFF.
It would be good to have a 3rd option "On Open",
so that Shift+F5 is not needed any longer.
If I open the tree, it would get refreshed, if needed.

Walter

Mario

#7
I have never given this a second thought, really. My own database has about 240,000 files and I have 12 data-driven categories. They are always updated in the background, without causing any fuzz. The "manual update" for categories has been added for users who combine massive databases with very slow computers (e.g. notebooks or tablets) and who want some more control. I don't think that adding more options, increasing the complexity even further, will do the majority of uses any good.

Please update some of your categories manually while you are not also adding files in the background and attach the log file. This will show us how long it really takes on your box. I think it will will boil down to IMatch not getting up to speed because the disk cannot deliver the data fast enough. On a system with 8 (virtual) cores, the hard disk is the limiting factor. IMatch can calculate levels of categories and categories on all processors in parallel, if the disk can deliver the data fast enough or the data is in the file system cache already.

Your database has 160,000 files. This is considered as an 'enterprise-level' data volume my most of the competitors out there. There are limits even for what IMatch can do. Data-driven categories, collections, formulas and all the other cool things you have at your disposal in IMatch require some computing power. Not only the number of processors is important, but also that IMatch can read and write the data fast enough. On your system, disk I/O is the limiting factor. Either your virus checker is slowing everything down, or your disks are slow-ish. Consider spending some money for a SSD, it's worth it.

scw2wi

Hi Mario,

I see already improvement in my workflow and I would like to spend some more time on implementing more of your suggestions before I upload the next logfile.

*) Virus Checker

I have checked the configuration. The database is already excluded, the pictures directory is not.
I have tested to disable the complete Virus Checker, it makes no difference. So this should be ok.

*) Change to Manual Categories.

This has given the most improvement. Step 1 and 2 is done as usual over night, Step 3 is not noticeable any longer.
So I can directly start working and my major concern is already solved by that.

*) Implement Database Transactions in writeback script.

This I will do next weekend. Since there are only 2 manipulations for each file I am not expecting much improvement,
but I have another script with about 20 manipulations and here it could speed up the process.

*) Remove not needed Tag Groups

I have not tested this yet but I'm not expecting much improvement.
I'm not sure if ExifTools needs time to scan for tags in images that are not there.

*) SSD

This will bring the most benefit and I'm planning this for the next weeks.
I will post new measurements when I'm ready to make more test.

If I can still see locks and not responding application after all these steps I will upload the next logfile as suggested.
Thanks for all your valuable hints, I'm sure I can speed up all the steps now.

Walter

scw2wi

I have upgraded to SSD and are working since 1 week now with the new configuration.
The improvement is significat.

Ressource-Monitor measures a factor of 4x faster for write operations to database,
and changes happens approx. 2x faster till application is idle again.

As suggested this brought the best performance improvement
and if there is still anybody working with database not on SSD
I am also suggesting now to upgrade to SSD.

The pictures are still on internal HD
but if there are 4TB SSDs available at reasonable costs,
I will upgrade also the pictures to SSD.

Walter