Metadata write back Shift+Alt+S sooo slow

Started by mking, December 19, 2023, 07:54:49 PM

Previous topic - Next topic

mking

Hi, I am on latest 25.3.2 and have lots of files that need Metadata write back (pencil icon showing) for some reason.
I saw in another post that Shift+Alt+S will force this (can't find that any where in the docs btw) and so tried that and after a few minutes it had done 10 files and said 399 hours to go for 125,000 files :( !


Now I am on a 32 core latest threadripper with 256GB of memory and images are on a raid 16GB/sec. SSDs and system disk is 8GB/sec.

So why is the metadata write back soooo slow?

Is there a faster way to write back this metadata

Fortunately there was a cancel button!

Txs,
Mike

JohnZeman

Menu > Commands > Metadata Writeback > For All Files (SHIFT+ALT+S)
Menu > Commands > Metadata Writeback > For Selected Files (CTRL+ALT+S)

For what it's worth, whenever I have a huge number of files needing writeback I never writeback to all of them at once for the reasons you just found out.  Instead I'll select the images needing writeback in blocks of 500 or so, and write back to that block of images before moving on to the next block of 500.

Mario

#2
Quotesaw in another post that Shift+Alt+S will force this (can't find that any where in the docs btw)

When I search for the word write in the IMatch help, the first result is Metadata Write-back.

On that page, you'll see Writing Back Selected Pending Files and Writing Back All Pending Files which explain what the commands do and their keyboard shortcuts.

The Keyboard Cheat Sheet lists all frequently used keyboard shortcuts.

All menu commands list the associated keyboard shortcut:

Image1.jpg


Regarding your write-back issue:

Please ZIP and attach a log file in debug mode (Help menu > Support > ...) (The IMatch Log file) from a session where you triggered a write-back for some files.

This will show us how long the write-back takes, which steps take how long etc.
Which file formats you are processing (big difference between 1 MB JPG files and 500 MB TIFF panoramas) etc.
This will give us a minimum of information to work with.


Do you use versioning with propagation?
Which virus checker do you use?

IMatch writes back files in multiple parallel threads, depending on the core count. Maybe this causes some issues on your system?

Go to Edit menu > Preferences > Application and reduce the threads for write-back to, say, 4 or 6.
Does this change the behavior?
See  Process Control (Advanced Setting)


mking

Hi Mario,

Thanks for the fast feedback. 

I will generate the data requested when I get a chance and share it here.

Best,
Mike

janb83

#4
Mario, I think there is definitely something going on here. After you remarked that 3 hours is not normal in the other topic I tried another write-back. I actually recorded a video doing so but had the wrong settings in my capture software, so the recording is useless.

Some observations:
- When you start a write-back, it takes several minutes for the progress dialog to show up. Until that point, the CPU load is really low
- Once the progress dialog that says "Updating Metadata in files.." opens, it will be stuck at 0 for about 15-20 minutes, again with really low CPU and HDD load
- When it finally starts processing individual files, I can see 8 working ExifTool instances, but there are also several more not doing any work. CPU load is at 30-50% in this phase, RAM Usage around 70%, HDD usage quite low.

One interesting observation: Even if you immediately cancel the write-back in the progress dialog, it will still take the same 15-20 minutes to actually close the dialog that it would take to actually start processing files. So there is some algorithm taking up this time which does not get cancelled by the cancel button.

I messed up keeping track of timecodes somewhat, need to repeat the experiments, but I think during this 0 progress timeframe, there is huge amount of these calls in the debug log, nothing else:

ADMIN: Text data removed.

mking

OK so with debug logging on, running SHIFT+ALT+S it appears to have got stuck in some loop with no files processed in 12 mins. 133,000 files to go!

I am probably going to have to kill it.

Debug log zip attached - well it would be attached if there was an attached option!
How do I attach a file?

Txs,
Mike




JohnZeman

Quote from: mking on December 19, 2023, 08:51:19 PMHow do I attach a file?

You can't attach a file with the quick reply, use the regular reply which has an option for adding attachments.




janb83

As a sidenote: Why does IMatch start Reading Metadata for the files it wrote during write-back? I mean, it already knows what it wrote. This process takes another hour for the 30k files on my system :(

mking

OK here is my log file - 12 mins elapsed and nothing processed as far as I can see...
seems to be stuck on GetFileFaceCount ....

BTW I do worry that a lot of the slow down I am seeing is because of features like face recognition - which I care nothing about.

Is there an option to turn off "features" like this that some people like me do not require?

Thanks,
Mike

Mario

QuoteReading Metadata for the files it wrote during write-back?
During write-back ExifTool produces a lot of modified metadata, updated digests / checksums, time stamps, migrates data between XMP, EXIF, IPTC, GPS etc.
IMatch reading the data afterwards ensures that the database contains up-to-date data.
IMatch does nothing just for fun.

QuoteHow do I attach a file?
Click on the Preview when you use Quick Reply.
In normal reply, the attachment feature is visible always directly under the post editor.

SMF forum user help pages.

PLEASE DO NOT just dump tons of log file data in your reply.
It's unreadable on small screens and floods the community search engine with junk, which is bad for all users.
Attach a text file the next time. I have removed the code sections you have pasted since they don't contain any useful data. We need the complete log file, in debug mode.

You did not answer my questions, e.g. about your virus checker. See my post and provide the info, thanks.


Mario

Quote...e of features like face recognition - which I care nothing about.
The automatic face recognition "feature" is off by default in IMatch.
Unless your files contain face regions data in XMP embedded by some other software. Did you check that?
Also check under Edit > Preferences > Application that you have not accidentally enabled face recognition. Search for face on that page.

I will look into your log file shortly over the next couple of days.
Provide the answers to my questions in the mean time. I don't ask questions for fun.
Did you try my suggestion about processing threads? Was there any change?

janb83

Quote from: Mario on December 19, 2023, 09:30:04 PMYou did not answer my questions, e.g. about your virus checker. See my post and provide the info, thanks.

Sorry, I thought I did. I am using Windows internal virus protection only, and the photo folder has been added as an exception. All other applications that access the HDD regularly like Backblaze had been disabled/closed for this test. As I said, the 15-20 minute phase until files are actually processed (first small, not tons of, log I posted. sorry, I thought your forum search would not search code blocks by default) does not utilize CPU or HDD at all, or rather, not more than an idling IMatch. During this very long phase, IMatch is not yet modifying any files on the file system. I monitored this.

This is the phase that mking also mentions as "nothing processed" that can not be canceled by the cancel button.

Btw, my log WAS a debug log.

Mario

I've had a look at the log file.

There are some warnings about unsupported PSB and TIFF files. You've already asked about that in another thread and I've explained.

Some warnings about read-only files which cannot be written.

Then you start writing back 133,980 files!
In preparation of this massive write-back, IMatch checks for and transforms face annotations into XMP regions.
IMatch retrieves the face count from the database for each file. If there are face annotations, it converts them into XMP face regions.
Which takes about 16ms per file on your system.
That makes about 35 minutes just for this step, if I'm not mistaken.

IMatch was still busy doing this when you created a copy of the log file.
Not a single write-back was performed so we have no timing information.

Writing back almost 140K files on one go is not a good idea.
This will put your system under massive stress for a very long period of time.
I suggest you write back in batches of, say, 1,000 files or so and see how it goes.
This will also produce more manageable log files which contain the actual data we need.

janb83

Ok, this exactly explains what I was seeing as well. Yes, I have a lot of face annotations. I did not actually add/change any face annotations though since the last write-back.

I assume you are doing this because otherwise, you would overwrite the XMP with no face annotations, essentially deleting the existing ones. And you don't have the actual XMP annotations persisted anywhere in the IMatch db, so you actually have to calculate them at runtime.

Is this assumption correct? If it is, I see two potential solutions. Whether or not these are possible and which one would be better is something only you can say.

Both would require an internal flag per file storing if the face annotations were changed in IMatch.

1) Store the XMP face annotations in the IMatch database (when reading XMP and after writing new XMP) and instead of calculating them on the fly from IMatchs own internal annotations, just use this stored value, unless the annotations were changed in the meantime in IMatch

2) Instead of calculating the XMP face annotations before write-back, for the files that had no face changes in IMatch, during write-back first read the XMP from the file and preserve the existing face annotation section. This is probably the slower approach and might not be faster at all as every file has to be read first. Unless you are already doing this during write-back anyways.

The second approach would allow external face annotation manipulation not to be overwritten during write-back (which would currently happen unless you manually rescan metadata first) unless the annotations were also changed in IMatch.

janb83

Quote from: Mario on December 20, 2023, 10:28:46 AMThis will put your system under massive stress for a very long period of time.

It actually doesn't. CPU load is extremely low during the face annotation preparation step (maybe this could use some parallelization?). It's higher during actual write-back, but still not extreme (~50% for me).

QuoteI suggest you write back in batches of, say, 1,000 files or so and see how it goes.

Except for logging, this doesn't help at all. The overall duration is the same, but more manual user input is needed. So, from a user perspective, this approach is worse than just running it for all files and coming back a few hours later ;)

Mario

QuoteI assume you are doing this because otherwise, you would overwrite the XMP with no face annotations, essentially deleting the existing ones.

QuoteAnd you don't have the actual XMP annotations persisted anywhere in the IMatch db, so you actually have to calculate them at runtime.
IMatch has, but does not update them when face annotations are changed. This is postponed to the actual write-back. Makes no sense before.

The current implementation worked very well since 2016. I have no intent to change it, thank you very much. 16 ms per file is not much, and typically users write back a few dozen or maybe a few hundred files. Not 140,000 files. Compared to the actual time required to write metadata to files and synching XMP with EXIF, legacy IPTC, GPS etc, the 16ms is negligible.

janb83

Quote from: Mario on December 20, 2023, 11:14:31 AMIMatch has, but does not update them when face annotations are changed. This is postponed to the actual write-back. Makes no sense before.

But then all you need is a flag that the annotations did not change. As I said, I did not change any face annotations. So there was no update needed at all. It's updating them to the state they already have. This seems pointless.

QuoteThe current implementation worked very well since 2016. I have no intent to change it, thank you very much. 16 ms per file is not much, and typically users write back a few dozen or maybe a few hundred files. Not 140,000 files. Compared to the actual time required to write metadata to files and synching XMP with EXIF, legacy IPTC, GPS etc, the 16ms is negligible.

Well, it's still 10% of the total runtime of the write-back, and worse, it's the only part that does not make sense to the user as there is no visible progress There's really no need for the tone, we are trying to improve the software, not attack you. Keep in mind that this is probably a typical first experience for new users using IMatch as new users typically have at least thousands of images, and IMatch will almost certainly need to write-back to all of them due to bad XMP metadata. So, almost every using trialing IMatch will run into this "stuck at 0 progress" problem. How many decide against purchasing due to this?

Mario

#17
I pretty much doubt that this step accounts for 10% of the total runtime.
The typical write-back time for an image is between 0.5 and 5 seconds, depending on format, location, state of existing metadata, file format and size. IMatch writes back multiple files in parallel, but even then 0.016 seconds compared to 0.5 to 5 seconds hardly results in 10%.

Feel free to add a feature request for an improvement for this step.
This will show us how many other users have encountered this problem and if it would be worth the time and cost to somehow make it faster, skip it, provide more user feedback etc.

I refactor and enhance parts of IMatch all the time. But it makes no sense to spend a day or two on an optimization nobody will ever notice.

janb83

Quote from: Mario on December 20, 2023, 01:16:13 PMI pretty much doubt that this step accounts for 10% of the total runtime.
The typical write-back time for an image is between 0.5 and 5 seconds, depending on format, location, state of existing metadata, file format and size. IMatch writes back multiple files in parallel, but even then 0.016 seconds compared to 0.5 to 5 seconds hardly results in 10%.

Feel free to add a feature request for an improvement for this step.
This will show us how many other users have encountered this problem and if it would be worth the time and cost to somehow make it faster, skip it, provide more user feedback etc.

I refactor and enhance parts of IMatch all the time. But it makes no sense to spend a day or two on an optimization nobody will ever notice.

As I said, this issue has the worst impact on users trying out IMatch. You will not be able to estimate how beneficial it would be for your conversion rate by having existing users vote on it.

You can doubt it's 10%, but in my case it was around 20 minutes out of ~3:30h.

Mario

Attach a log file of that session, where we can see the actual write times etc so I can get a full picture.
The annotations to face regions is just a tiny part of the overall metadata processing that takes place in memory before the write-back can start.

QuoteYou will not be able to estimate how beneficial it would be for your conversion rate by having existing users vote on it.
I have no idea. No user ever mentioned this and no trial user ever contacted me about this.
I'm sure in all the years, at least one user would have asked. I just came never up as a problem, no bug report and I also don't see a related feature request. Maybe this is a not a problem for the majority of users.

Again, feel free to file a feature request.
Suggestions and feature requests embedded in general discussions will be forgotten in a week.
If you write one, we can learn if and how many other users see this as a problem. And if there is a sufficient number of users, I will spend time and look into this. Else, my to-do list is always full.

janb83

Quote from: Mario on December 20, 2023, 01:59:18 PMThe annotations to face regions is just a tiny part of the overall metadata processing that takes place in memory before the write-back can start.

Ah, earlier you made it sound like this is all that is happening before the actual write-back. My 10% comment referred to the 0 progress phase of the progress bar. If other things are done at this time too, then the face annotation issue is of course less %. I know there is another phase before it even opens the progress bar, but that was not part of my ~20 minutes.

Either way. A simple improvement would be to add another progress bar that tracks pre-write preparations per file. Imho this does not need a feature request as it's more of a bugfix (a progress bar stuck at 0 for minutes (or potentially hours with even larger amounts of files) would not pass QA at my workplace, as would the cancel button that does not cancel until minutes later).


QuoteI just came never up as a problem

Well, not quite? This topic was not started by me and has the same issue. There's also this topic. It even mentions one of the issues I mentioned explicitly ("Hitting the cancel button does not cancel the process"). You simply focused on the other issues the user had.



Mario

Hitting the Cancel button will stop the process as soon as possible.

When there are currently 4, 8, 16, ... ExifTool instances writing back data to files stored on a NAS, and each of the files also has one or more versions to write-back to after the original master file was written, Cancel cannot be immediate or we would end up with non-propagated data or worse.
IMatch cannot hard terminate ExifTool instances still consuming CPU resources or performing file I/O - this would produce damaged files. Waiting for the Cancel to finish is the better option.

QuoteEither way. A simple improvement would be to add another progress bar that tracks pre-write preparations per file.

Yes, one of the possible options. Let's add your feature request and then we'll se if this is worth investigating.

janb83

#22
Quote from: Mario on December 20, 2023, 03:28:59 PMHitting the Cancel button will stop the process as soon as possible.

No, it does not. We are talking about the phase before the actual write-back starts (during which the progress bar shows 0). If you cancel during this phase, the cancellation will only happen after it has done all the preparations. It's exactly the same time that would have passed until the counter goes to 1. It is not using ExifTool instances yet.

In other words: If I hit cancel immediately after the progress dialog opens for 30k files, it will take ~20 minutes to close. This is unrelated to waiting for ExifTool instances, which even if there were some would happen much quicker.

QuoteYes, one of the possible options. Let's add your feature request and then we'll se if this is worth investigating.
I added a feature request.

Mario


QuoteNo, it does not. We are talking about the phase before the actual write-back starts (during which the progress bar shows 0). If you cancel during this phase, the cancellation will only happen after it has done all the preparations. It's exactly the same time that would have passed
I will look into this. There should be checks for cancel in the preparation to exit early if the user decides to cancel during that phase.

janb83

Quote from: Mario on December 20, 2023, 03:48:36 PM
QuoteNo, it does not. We are talking about the phase before the actual write-back starts (during which the progress bar shows 0). If you cancel during this phase, the cancellation will only happen after it has done all the preparations. It's exactly the same time that would have passed
I will look into this. There should be checks for cancel in the preparation to exit early if the user decides to cancel during that phase.

Thanks :)

mking

#25
Hi Mario,

Many thanks for looking at my debug log.

I appreciate it is what it is in terms of performance - not what I hoped for on what is probably as fast a desktop computer you can get.

First I have confirmed I have face recognition turned off - so WHY is it bothering with face annotation info?
BTW even though I have face recognition turned off it still is detecting faces in my images - should it be doing that?

Second Its just not reasonable to do 1K at a time - that's 140 events! So I will just set it running when I go away for 3 days at Christmas and hopefully its finished when I return!

I appreciate all your efforts with iMatch and understand that you can't optimise all aspects, so please forgive my frustration when I hit a bottleneck. I could copy the data from one disk to another in probably less than 60 mins, so was not expecting this to take more a few hours.

Maybe add a warning note in the docs that Metadata Write Back can be very very very slow for 10,000s of files - like days long....
Better still in the popup based on the number of files some sort of warning about time. Right now it just says 135,000 files do you want to do this - pretty uninformative....

Thanks,
Mike



 


Mario

#26
Quotestill is detecting faces in my images - should it be doing that?
Of course. See Working with XMP Face Regions
I'm quite sure that your files contain face regions embedded by other software.

Mario

Quoteack can be very very very slow for 10,000s of files - like days long....
Hm, I'm really not sure about this number of where you got this from.

I've just written back 1,194 NEF RAW files on my 4 year old PC in 3:04m.
Database on SSD, images on RAID spinning disks.
I've stopped it with my smart phone. Full write-back with XMP->EXIF/GPS in NEF sync, XMP sidecar generation and re-import of the files are the write-back. I've stopped the clock when the queue was empty and IMatch idle.

About 3 minutes for a volume of images that requires one or two studio days or a week of vacation to produce. I'm OK with that.

And since this is fairly linear (unless you mix RAW with NEF with MP4 with 2GB TIFF files in a write-back), this means that IMatch takes about 30 minutes for 10,000 files. Quite good.

mking

HI Mario,

I am just going by what the pop-up told me.
Worst was 400hrs best was 69hrs for 135K files.

Of course I killed the process before I could find out the reality.

I can run a real test tonight on say 10,000 images and report back tomorrow.

My recent raw files tend to be large - I am using a 102mp camera and a file with layers can easily be over 4GB.

TBH you don't have to be doing anything special these days to kick over 4GB - 16bit file with a few layers and you are there.

Of course many of my older files are smaller.

Anyway we will see I will report back tomorrow evening.

Thanks,
Mike

Mario

QuoteI am just going by what the pop-up told me.
Worst was 400hrs best was 69hrs for 135K files.
The estimate is calculated based on the timing of the last 20 or 50 files, I recall. So it will vary a lot if you mix RAW files from the same camera with JPEG files, MP4 files, PDF files and whatnot in one write-back. Which is fairly uncommon, though.

QuoteI am using a 102mp camera and a file with layers can easily be over 4GB.
Well, that explains a LOT.
Maybe an all-purpose DAM like IMatch is not the best fit for you and you should talk to Haseelblad or PhaseOne or Red or whatever and consider using their specialized software?

Or, maybe consider one of the corporate-grade DAM vendors like Canto, Widen, ImageBank if they can handle your files?
They have (usually free) pre-sales consulting and I'm sure you'll find a DAM software that better fits your needs than IMatch.

Quote(...) kick over 4GB - 16bit file with a few layers and you are there.
Yeah, well...

The average file size managed by IMatch users (according to telemetry) is between 5 and 50 MB per file. Videos excluded.

So maybe the file sizes you work with are merely an exception but not the rule?
Have you considered telling your software to compress your files using LZW or ZIP (lossless)?
A 4GB TIFF file then typically shrinks down to about 500 - 800 MB. Much easier to handle, much less to backup.

ExifTool will take a sweet time to create a copy of a 4GB TIFF file, write metadata to it and when everything works 100%, delete the original file and rename the copy. This is how ExifTool works. It may be slower than directly writing to the original file, but it is also a lot safer. And 99% of the write-back time is spent in ExifTool, not IMatch.
Keeping the files you work with on a SSD until they are finished will be a massive speed boost too.

If you consider IMatch to slow, see this link for a comprehensive overview of DAM products. No affiliation. Enjoy. I'm sure you'll find something better.

mking

Quote from: Mario on December 20, 2023, 08:30:54 PMThe average file size managed by IMatch users (according to telemetry) is between 5 and 50 MB per file. Videos excluded.

Really - pretty much any camera today will generate a 25mb+ raw file.... unless your users are only saving jpegs. Did you consider this might be skewed by people who agree to telemetry! More professional users might be less willing.

Also Imatch used to be faster than alternatives - hence why I started using it more than 15 years ago. Maybe it still is but TBH I am not that interested in checking out alternatives - too invested in Imatch - just want to make it better together, if we can, with constructive feedback.

>>A 4GB TIFF file then typically shrinks down to about 500 - 800 MB. Much easier to handle, much less to backup.

Unfortunately the single threaded compression and decompression of compressed TIFF and PSB is MUCH MUCH slower than loading uncompressed files from raided SSD or even single NVME SSD. So not such a great choice these days (in fact been a bad choice for a long time).

>>If you consider IMatch to slow, see this link for a comprehensive overview of DAM products. No affiliation. Enjoy. I'm sure you'll find something better.

I think a more constructive approach would be to explain what is happening and that for example even at 5 files per second 135000 files is going to take at least 7.5 hours.  Bigger files might make this 19+ hrs. Its just a lot of files !

So let me share for the benefit of anyone reading, what I found with a few tests today.
Meta data write back time per file size 50 - 77MB was about 200msec - which is great - I guess this is limited by consumer SSD transaction speed (not CPU or raw SDD bandwidth which is much greater than this). 

3000 Files of average 188MB were slower average per file, about 370msec. - but that was still 507MB/sec. write so 1014MB/sec. read and write!  

So in conclusion, yes the its not leveraging my 14GB/sec. SSD bandwidth probably because it can't handle the transaction rate, but its not crawling along either - there are just a lot of files.

Mario, I think if you could;

a) issue a writeback warning not just about the number of files but potential time that might take - just pick a data rate say 300msec / file and warn how long this might take.

b) make it possible to cancel the writeback during that initial phase when it might take 30mins or more before you can cancel.

Hope that is helpful.

Thanks,
Mike







 






 





Mario

#31
a) The write-back dialog displays an estimate.
b) This is already available in IMatch 2023.5.4, released today.
See also https://www.photools.com/community/index.php/topic,13855.msg97632.html#new

QuoteReally - pretty much any camera today will generate a 25mb+ raw file.... unless your users are only saving jpegs.
Which is why the average is between 5 MB and 50 MB. A Z7 Raw has about 50MB.

mking

>> b) This is already available in IMatch 2023.5.4, released today.
See also https://www.photools.com/community/index.php/topic,13855.msg97632.html#new

Great - thank you for being so responsive.

Mike

mking

OK so tried the new  IMatch 2023.5.4, much faster to start doing the real work... great txs.
However the time estimate still starts in lala land.

I ran it with 11,000 files 900GB data, knowing from my other tests that this would definitely take less than 45mins, probably closer to 30mins. Actually took 34mins (SSDs got a bit warm:) .

Initial time estimate presented by IMatch though was 89 hours!  If I didn't know better that would freak me out and I would cancel it. That's where I was when I started this thread.

I think the issue is that the time estimate is based on number of files and time to write the files.
And files do take (based on debug log) 1 - 10+ sec. each to write depending on size.

BUT IMatch is running a lot of simultaneous threads here and so actual real average time is more like 200msec. per file and it takes a few hundred files before the estimate fully accounts for that.

Mario, my suggestion is that you don't put up any time expectation initially until you know you are in the ball park. Not sure how you know when that is, but TBH saying nothing is probably better than info that is out by 170x !

Hope that helps someone.

Happy Holidays.

Mike


Mario

The time display is based on a sliding window of the past n files.
Du to processor utilization, I/O, system activity, variance in file formats and file size, the estimate may vary widely over the duration. Again, writing 100,000 files at once is a very uncommon task. If you write 50 to 1000 files, like most users do, the estimate is much more betterer.