Performance of metadata write back

Started by MoeDiMa, October 11, 2014, 04:03:20 PM

Previous topic - Next topic

MoeDiMa

Hi,

I am new to IMatch, coming from IDimager (which didn't like the big files generated by my new camera) and trying to figure out the best way for me to work with IMatch.

During playing around with IMatch, I found a big performance issue with metadata writeback. It takes about 3 to 10 s per file, and sometimes locks up completely.

While trying to find the root cause for the slow performance, I found that exiftool rewrites the picture file twice (see the attached capture of the output window), which somehow seems strange to me.
Also, sometimes the exiftool process seems to stop responding. After 180s, IMatch returns with an error message stating that the metadata have not been updated. The logfile says "PTWrapper hangs in ProcessRun for 180000 ms", followed by the exiftool commandline (which looks basically the same as the capture from the output window). When this happened, it is not possible to do metadata writebacks anymore without a restart of IMatch.

The pictures are located on a Synology NAS (DS 214), connected by Gigabit-LAN. I am aware of slower performance of operations on NAS storage, but currently the metadata writeback is simply not possible.

Do you have some suggestions for me to improve the performance and/or at least preventing the lock ups?

[attachment deleted by admin]

Mario

Hi,

Depending on the file format, the data to update, the Metadata and Metadata 2 options, your custom settings and MWG compliance rules, IMatch may need to update a file twice, especially if the first write causes 'new' metadata to be created.

welcome to IMatch. If ExifTool takes more than 180 seconds to write back data to a file (typical write-back for a JPEG is 1 to 3 seconds, RAW files maybe 5 to 20) there's something wrong. And image file on a NAS box is the worst-case for sure, but if ExifTool blocks for 180 seconds, there is usually a crash involved, a locked file or something.

You did not include the IMatch log file (see the IMatch help for details) so I cannot say much. Please switch IMatch to debug logging (Help > Support > Debug logging), repeat your test

Please also copy down the file you use for your test into a local folder and add this folder to your database.
Perform the same update / write-back on the local file.

This gives us info about how long things take on your NAS compared to a local disk.

Then ZIP and attach the log file to your reply.

joel23

#2
Quote from: MoeDiMa on October 11, 2014, 04:03:20 PM
I am aware of slower performance of operations on NAS storage
Only when SMB/CIFS is used.
The problem might not be related to the protocol used, but your NAS supports iSCSI - if you don't need to share the folders with other computers in the net, I strongly suggest to block-level iSCSI. A HowTo is to find on their website
(you might need to move your files before creating an iSCSI LUN)
regards,
Joerg

MoeDiMa

Thank you very much for your prompt reply!

@ joel23: Good idea, but unfortunately one main reason for setting up the NAS was to have the files available from different clients in the network. So iSCSI, which as far as I know allows only one client, is not a real option.

If the 3 s per file are normal, I will live with it - this is similar to what I experienced with IDImager. But the double updates of the files also double this time... However, in the worst case I would have to change my workflow, e.g. importing the files from the camera first to a local harddrive. Then, after doing the cataloging work, I could move the files to the NAS.

What I still don't understand is why IMatch needs to start exiftool twice on the file (leading to a double rewrite). I could understand that this happens after the first write back after the import into the database. But it also happens if I later only add one keyword. Even worse, quite often IMatch claims immediately after the write back that the file still has pending write backs. Usually IMatch wants to write back XMP::dc\Subject again, which triggers again a double invocation of exiftool. So in sum, to get rid of the pencil icon, my image files get written four times! Add this to the slow performance of the NAS...
Comparing the metadate before and after the write back of XMP::dc\Subject shows no obvious difference. Sometimes the sort order of the Keywords in the Subject fields has changed, but not always.

What worried me most was the hang of exiftool. I usually do the cataloging for ~100 files and triggering the rewrite after that. When the issue appears, this leads to ~300 minutes of IMatch not responding, until those 180s timeouts for each file has passed. In the end I have to kill IMatch if I want to continue to work with it - and I don't like this very much considering the impact this might have on my database.
I have attached a logfile where such a problem occured. I tried to reproduce it today, but (un-)fortunately today everything worked fine...


[attachment deleted by admin]

Mario

#4
From the log file it looks as you update only JPEG files. ExifTool usually updates JPEG files in less than one minute.

Did you try the same with files in a local folder as I asked in my reply?

What do you mean with 'double write'?
IMatch may need to perform multiple -executes in order to ensure metadata working group compliances, but that's normal.
Please open the ExifTool output panel in IMatch and perform a write-back.
Copy/Paste the output panel contents to your reply.

MoeDiMa

Currently I started to work with the JPGs in IMatch. Later I also want to include the RAWs when I find the time to read the help file regarding the version/buddy files topics. But for now, it is mostly JPGs.

I tried again with some files on the local hard disk as you suggested. There the performance is much better, about 1s per file. I was also not able to reproduce the hang of exiftool. I'll keep debug logging on to capture it if it appears again.

With 'double write' I meant the double -execute. Wouldn't it be better (from a performance point of view), if IMatch would internally prepare the full metadata in a MWG compliant way and execute exiftool only once on a file?

I have added the exiftool output of a case, where immediately after the write back of metadata (by clicking the yellow pencil), the pencil appears again. Only after the second click on the pencil, it disappeared.
Before the first write, IMatch reported these fields to have pending updates:
Composite\Keywords
IPTC::ApplicationRecord\Keywords
XMP::dc\Subject


Then, the pencil reappeared, claiming these fields need an update again:
IPTC::ApplicationRecord\Keywords
XMP::dc\Subject


As you can see in the screenshots of the Metadata panel, I can't notice a change in the metadata between the first and second write process.

--
Dieter

[attachment deleted by admin]

Mario

QuoteWith 'double write' I meant the double -execute. Wouldn't it be better (from a performance point of view), if IMatch would internally prepare the full metadata in a MWG compliant way and execute exiftool only once on a file?

IMatch uses argument files provided by ExifTool/Phil in order to map IPTC/EXIF/GPS/PDF data. These argument files need to be run separately. I could probably integrate all the logic contained in the args files into IMatch, at the risk of a) making mistakes and b) getting out of synch with the ExifTool distribution when Phil makes changes to the argument files in later ExifTool versions, or MWG comes out with some new rules, of Phil works around problems found in some formats by adding more logic to argument files.

Please consider that when the second execute happens, Windows has the file still in the file system cache, so in almost all cases, this will be a in-memory update and no additional disk writes occur.

If you don't experience the problems with local files, there is an issue with the NAS box. I have experienced flawed SAMBA implementations on NAS boxes which produce random problems when under high load. SAMBA is what simulates the Windows file system on Linux, which runs on most NAS boxes. If the file system "unlinks" a file under heavy load, ExifTool will be stopped dead in the tracks, waiting for the file to come back online again.

And if this takes more than 3 minutes, IMatch will stop waiting for ExifTool and abandon the process in order to protect itself. This should not happen of course. I have most of my sample/test databases and test file suites on a pro-grade Synology NAS system with a Gigiabit cable connection. No problems.

MoeDiMa

Thank you for your explanation. I totally understand your point, and of course the consistency of the data has the highest priority.

Since the problem of not responding exiftool did not occur during the last tests, I also assume that something was wrong with the NAS box on the weekend. It's a bit strange that the problem simply disappeared without changing anything - not even a restart of the NAS box. It was easy to reproduce on the weekend but not afterwards...

I chose a medium-grade Synology NAS for my small home network, assuming that it should have sufficient power for my needs.
I will dig deeper into running background tasks on the NAS in case the problem reappears (finally - a chance to refresh my knowledge of Linux commandline tools  ;) ).

So thank you for your great support! In the end, the help I found on this forum and also your dedication to the users of IMatch made me choose IMatch as replacement for my now unsupportet old DAM system.

There is only one question open: Do you have an explanation for the other issue I described:
Quote from: MoeDiMa on October 13, 2014, 08:57:08 PM
I have added the exiftool output of a case, where immediately after the write back of metadata (by clicking the yellow pencil), the pencil appears again. Only after the second click on the pencil, it disappeared.
Before the first write, IMatch reported these fields to have pending updates:
Composite\Keywords
IPTC::ApplicationRecord\Keywords
XMP::dc\Subject


Then, the pencil reappeared, claiming these fields need an update again:
IPTC::ApplicationRecord\Keywords
XMP::dc\Subject


As you can see in the screenshots of the Metadata panel, I can't notice a change in the metadata between the first and second write process.


sinus

I am not sure, if this fits your question, but anyway, I give you this link, you will see it quicker than me:

https://www.photools.com/community/index.php?topic=3324.msg21897#msg21897
Best wishes from Switzerland! :-)
Markus

Mario

QuoteI have added the exiftool output of a case, where immediately after the write back of metadata (by clicking the yellow pencil), the pencil appears again. Only after the second click on the pencil, it disappeared.

This can happen if there is a difference between the XMP-dc:Subject (XMP flat keywords) and/or the legacy IPTC keywords or IMatch produces 'new' hierarchical keywords when importing the file, depending on the import settings you have chosen under Edit > Preferences > Metadata and whether you use a thesaurus or not.

A second write-back then flushes out the "new" keywords, synchronizing flat XMP keywords, flat legacy IPTC keywords and hierarchical XMP keywords.

MoeDiMa

Quote from: sinus on October 14, 2014, 10:10:14 AM
I am not sure, if this fits your question, but anyway, I give you this link, you will see it quicker than me:

https://www.photools.com/community/index.php?topic=3324.msg21897#msg21897

Thank you for the hint. I will have closer look on this thread tonight. At a first glance, it seems to be different, since I used the keyword panel and the thesaurus for adding the keywords, but no templates. But I will check again in more detail.

MoeDiMa

Now I found the time to look deeper into the yellow pencil appearing after a write back of metadata. In my case, it seems to be related to the migration of data from my old DAM system. IDimager wrote hierarchical keywords to XMP-dc:Subject with a dot as hierarchy separator, e.g. Places.Germany.Berlin. IMatch is configured to also write hierarchical keywords to XMP-dc:Subject, but IMatch is using | as separator.

After the import, IMatch has pending write-backs for the files because of generating new metadata during the import process. This is normal, as Mario has explained. But, using exiftool directly on the file after the write process, I found that in this step, IMatch writes flat keywords to XMP-dc:Subject instead of hierarchical keywords as I would expect due to the settings in the metadata preferences. XMP-dc:Subject now contains Places, Germany, Berlin. During the re-import after the write, IMatch recognizes the non-hierarchical keywords in XMP-dc:Subject and marks the file again as having pending update. After the second write-back cycle, XMP-dc:Subject contains the expected result, Places|Germany|Berlin.

Attached you can find my settings in the metadata and metadata2 preferences.

I am currently reconsidering if I really need hierarchical keywords in XMP-dc:Subject, so the problem might disappear for me...

[attachment deleted by admin]

Mario

1. The | is the standard separator defined by the Metadata Working Group and this is why IMatch uses it. It is compatible with a wide range of applications.

2. IMatch writes hierarchical keywords with the settings you show. Just checked.
Do you perhaps use "exclude" levels in your thesaurus? They can be used to strip unwanted levels when flattening keywords.

MoeDiMa

There is no complaint about using | as hierarchy separator. And yes, IMatch writes hierarchical keywords, but only on the second try. Again: This does not happen to files I added fresh to the database using only IMatch. But files having already keywords with the . separator need two update cycles until the keywords are changed from the . to the | notation. My expectation would have been that this can be done in one step...
There are no 'exclude' levels in my thesaurus.

Mario

When the . is enabled as a hierarchy separator in Edit > Preferences > Metadata, IMatch will produce hierarchical keywords from the IPTC keywords in your files on import. You should be able to see them in the Keyword Panel. Is this the case?

Also check if the files already have XMP data, and flat/hierarchical keywords in XMP. You can do that with the ExifTool Command Processor in IMatch. Use these arguments:

-iptc:keywords
-xmp-dc:subject
-xmp-lr:hierarchicalSubject
{Files}


If you use RAW files with XMP sidecar files, check the XMP file for keywords as well:


-xmp-dc:subject
-xmp-lr:hierarchicalSubject
{XMPFiles}



On write-back, IMatch first deletes the IPTC keywords, and then writes out the hierarchical keywords, applying whatever flatten options you have defined, using your thesaurus to check for 'skip' levels. If you open the ExifTool output panel and write back one file, you can see what IMatch is doing, and how it updates the file. Attach the output to a reply if you need feedback for that.

Since I cannot see the contents of your files, your database, your options and your thesaurus I can only give general tips about what to check to see why you experience this behavior.

The ECP and the output shown in the ExifTool output panel after writing back a file should be helpful to diagnose this.

MoeDiMa

Thank you for all your help!

This is the exiftool output of one example file:
Keywords                        : Places.Deutschland.Bayern.Ottmarshausen, Keywords.Hintergrund, Styles.Natur.Blätter
Subject                         : Styles.Natur.Blätter, Keywords.Hintergrund, Places.Deutschland.Bayern.Ottmarshausen
Hierarchical Subject            : Styles|Natur|Blätter, Keywords|Hintergrund, Places|Deutschland|Bayern|Ottmarshausen


Now, IMatch wants to update IPTC::ApplicationRecord/Keywords and XMP::dc-Subject. The exiftool output of the write back is attached. As you can clearly see, IMatch writes keywords to XMP::dc-Subject as I would expect when I would enable 'Write hierarchical keywords' and 'Write path elements' in the preferences. But 'Write path elements' is not checked as you can see in the screenshots of my preferences attached to the previous post.

Output from ECP after the write-back:
Keywords                        : Bayern, Blätter, Deutschland, Hintergrund, Keywords, Natur, Ottmarshausen, Places, Styles
Subject                         : Bayern, Blätter, Deutschland, Hintergrund, Keywords, Natur, Ottmarshausen, Places, Styles
Hierarchical Subject            : Styles|Natur|Blätter, Keywords|Hintergrund, Places|Deutschland|Bayern|Ottmarshausen


Clicking a second time on the yellow pencil generates the exiftool output as in the second attachment, this time writing the keywords as expected.

Output from ECP afterwards:
Keywords                        : Keywords|Hintergrund, Places|Deutschland|Bayern|Ottmarshausen, Styles|Natur|Blätter
Subject                         : Keywords|Hintergrund, Places|Deutschland|Bayern|Ottmarshausen, Styles|Natur|Blätter
Hierarchical Subject            : Styles|Natur|Blätter, Keywords|Hintergrund, Places|Deutschland|Bayern|Ottmarshausen




[attachment deleted by admin]