I need help with (presumably) corrupted files

Started by DaweP, June 21, 2014, 11:45:13 PM

Previous topic - Next topic

DaweP

Mario,
my DB now contains ca 7000 files and it works very well. However, there are ca 500 files (0,5 - 1,5 years old) that exhibit some serious problems.

1) All the files are marked as "Pending Metadata write back". Trying to write down the metadata (keywords and rating) does not help, these files re-appear almost instantly as pending. Second, third, fourth trial does not help...
2) I can not change rating for some of these files. For example, one photo has rating 4 which I want to change to 2. I select 2 stars, save... and it re-appears as 4 star photo. Interestingly, in other (external) application, this new rating applied in IM5 is displayed correctly, but in IM5 it is not. Re-scanning did not help, even removing from DB and re-importing did not help.
3) Similarly, I can not edit/remove the keywords. Once I hit the yellow pencil, it resumes to previous state.

Common feature of all these photos is that the assigned keywords include some Czech diacritics letters. Although new photos work fine now and I can assign any Czech word without an problem, these photos had keywords assigned in past, deep in beta-testing era, when we struggled with this problem. I think that these files are somehow corrupted as far as (XMP?) metadata concerns and this causes non-standard behavior even in current 5.1.4... :-( So what I report here is, IMHO, not a problem (bug) of IM5, but something wrong in SOME of my files that I would like to deal with. Some legacy of past Czech diacritics issue.

Now to my question. Please, is there any way how I can "heal" those files? Some way to get rid of all previously associated keywords / rating for IM5 to treat these files as new? Obviously, to simply change the rating and/or delete keywords via keyword panel in IM5 did not help.

David

jch2103

This is only a partial response to your questions, as I don't have experience with issues involving mixed character sets.

1. See Mario's response in https://www.photools.com/community/index.php?topic=2309.msg16897#msg16897 especially regarding the tool tip and the Exiftool output panel.

2. Also see the ExifTool FAQ #20 http://owl.phy.queensu.ca/~phil/exiftool/faq.html#Q20 which discusses fixing corrupted ExifTool data, including cautions. (The FAQ assumes use of command line arguments.) I don't know if this repair fixes character set problems. As always, best to test on copies, not originals.


John

Mario

Which file format are you using? RAW? JPEG?
If you use RAW, does the RAW file contain embedded IPTC and/or data but IMatch is forbidden by the metadata options to write-back to the RAW file?

Please select one of your problem files in a file window and then run the ExifTool Command Processor with the "List Metadata" preset. Copy/Paste the results into a text file and attach (please do just copy/paste into your reply).

This tells us which metadata is in your file.

A typical scenario for re-appearing ratings are two competing XMP records, one in the image file itself, and one in an external sidecar file. In the default mode, IMatch merges the two XMP records, giving the XMP record in the file a higher priority. Also by default for RAW files, IMatch updates XMP data in the sidecar file, not in the image file itself. When you change the rating, it will be written to the sidecar file. But if the embedded XMP record in the file has a different rating, it will override the rating in the sidecar file.

"Other" software will most likely a) only look in the sidecar file, b) see the embedded XMP and ignore it, c) see the embedded XMP but consider the XMP in the sidecar file more important etc.

The general rule is to have only one XMP record for a file. And for RAW files, the standard says to store it in the sidecar file.

See CR2 files and re-appearing ratings for some more info on this.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

DaweP

#3
The "always-pending" files are of two types, JPEG and RAW (NEF).

Below is the output of the ExifTool Command Processor (List Metadata) for one of the problematic files.
I think you are right and it is caused by collision between the embedded and in-sidecar-file data. But for my defense, I did not alter these advanced settings from the defaults, nor had I in past in beta-testing... :-( So how could it get to this messy state?

Should I run the ExifTool Command Processor to Delete all metadata? I am hesitant doing so, as I only want to get rid of rating and Keywords, not all metadata (EXIF). On the other hand, using just the Metadata template to remove the rating and label did not help, it restored the rating again.

How do I manage to have only one sidecar file for these problematic files?

D.

ADMIN

I explicitly asked above that you should attach the output as a text file. Just copying and pasting a large amount of data pollutes the community search engine. I have my reasons when I ask you to attach a file.


QuoteCopy/Paste the results into a text file and attach (please do just copy/paste into your reply).

ADMIN
For your convenience I have removed the massive text data you have included in your post, saved it to a text file and then then attached it for you.




[attachment deleted by admin]

Mario

This NEF file has embedded XMP data.
This NEF file also has embedded IPTC data.
This NEF file has an embedded rating of 4.

If you have configured IMatch to write metadata only to the XMP file (this is the default), each write-back will cause a discrepancy between the XMP in the sidecar file and the metadata embedded in the file.

See CR2 files and re-appearing ratings for more info.

Solutions:

If you want to use XMP sidecar files with NEF files, strip the XMP data from the NEF.
To avoid problems caused my migration between the IPTC data embedded in the NEF and the XMP data, either allow IMatch to write-back IPTC to NEF files so it can keep the XMP data and the IPTC data in synch, or remove the IPTC data from the NEF file.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

DaweP

#5
So it is obvious that there is some discrepancy in the metadata. The question is why...

Quote from: Mario on June 22, 2014, 11:18:23 AM
This NEF file has embedded XMP data.
This NEF file also has embedded IPTC data.
This NEF file has an embedded rating of 4.

If you have configured IMatch to write metadata only to the XMP file (this is the default), each write-back will cause a discrepancy between the XMP in the sidecar file and the metadata embedded in the file.

As already stated, I kept the settings at default. Always. So this means that the sidecar should be the place where IMatch writes the data. So how did the data got embedded into the image file? The NEF files are kept as originals. Some are edited in ACR and saved as JPG, plus the original NEF files get XPM sidecar files as a result. But I thought that the NEF file itself is not modified in ACR. Besides, the rating and keywording is exclusively done in IMatch.

Furthermore, below is a list of data from an original JPG file from Fujifilm X20, which has not been modified at all. And although I can change a save the rating for this file, it also belongs to "always-pending metadata-write-back" group of files... :-(

Quote from: Mario on June 22, 2014, 11:18:23 AM
See CR2 files and re-appearing ratings for more info.

Yes, I have studied the linked text. But what I still do not get (sorry :-( ) is what changes in Metadata 2 I should actually make.

All is set do Defaults in Metadata 2.
In Configure File formats, all is at Defaults as well. Which means for NEFs:
Write IPTC: No, Write EXIF: No, Allow create IPTC/EXIF/GPS: No, XMP sidecar file: Default, Use XMP crop: No, Use data in THM files: No.
And for JPG:
Yes, Yes, No, Default, No, No, (respectively) .

Quote from: Mario on June 22, 2014, 11:18:23 AM
Solutions:
If you want to use XMP sidecar files with NEF files, strip the XMP data from the NEF.
To avoid problems caused my migration between the IPTC data embedded in the NEF and the XMP data, either allow IMatch to write-back IPTC to NEF files so it can keep the XMP data and the IPTC data in synch, or remove the IPTC data from the NEF file.

I want to keep it optimal. Which - if I understand it correctly - means embedded data for JPG, sidecar files for RAW. Right? I just re-read the Help and it should be so.
So, however the problem originated, I guess that for the NEF files, I should temporarily set IPTC writing and EXIF writing to Yes. This should synchronize the data with XMP record of IMatch, right? Then the files will be "healed" and I can return this settings to default. Please, confirm or correct.

But this will have no effect on problematic JPG files, as the one listed here:


ADMIN

I have removed the pasted ExifTool dump. Please read my comments above about not just pasting this amount of text data and why. I will not make an attachment for you this time, I don't have time time to cleanup your posts.



DaweP

Update:
It seems that the problem with JPG files is related to character-coding of Czech keywords. For some, Latin 2 (1250) was OK, but some were saved only with UTF-8.
Still a bit confused in what is the correct coding, as "normally" (files out of the problematic group) I can keep it at default and it saves the Czech keywords without problems...  ???

Mario

Quote from: DaweP on June 22, 2014, 12:28:33 PM
I want to keep it optimal. Which - if I understand it correctly - means embedded data for JPG, sidecar files for RAW. Right? I just re-read the Help and it should be so.
So, however the problem originated, I guess that for the NEF files, I should temporarily set IPTC writing and EXIF writing to Yes. This should synchronize the data with XMP record of IMatch, right? Then the files will be "healed" and I can return this settings to default. Please, confirm or correct.
Give it a try. The problem is that you have competing XMP records for this file, and an embedded IPTC record which cannot be updated. Ideally you should have only an XMP record for that file, in the sidecar. No embedded XMP. No embedded IPTC.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

DaweP

#8
Quote from: Mario on June 22, 2014, 08:58:01 AM
... (please do just copy/paste into your reply).

Mario, I'm sorry, I read the text too quickly and paid most attention only to the text in bracket where one important word was missing...  :(

DaweP

Well, I gave it a try, but it did not help, so I am slowly restoring the problematic images from my backup drive and copying cat-assignments from the problematic files to the newly indexed counterparts.
It is quite time-consuming, but as there is probably no other way how to clean the embedded IPTC records (some script or perhaps some external application...?), I see no other way.
Thanks for your help and I apologize once again for the extensive text pasted to my posts. As I said, I read your instructions too quickly. And to make it worse, I did not find your red-admin warning before sending the second one. I'm sorry for your wasted work :'( Entschuldigung...
D.

joel23

If you still want to delete only embedded XMP and IPTC keywords and EXIF/XMP ratings you can use this in ECP:

-overwrite_original_in_place
-IPTC:keywords=
-Xmp:Subject=
-Xmp:HierarchicalSubject=
-IFD0:Rating=
-XMP-xmp:Rating=
{Files}


but you probably want to delete all embedded XMP, IPTC keywords and EXIF ratings, than use this:

-overwrite_original_in_place
-IPTC:keywords=
-XMP:all=
-IFD0:Rating=
{Files}


Just don't use some of those statements in case you want to keep certain data. For example if you want to keep EXIF ratings don't use "-IFD0:Rating="

Be aware that there still is a glitch, means as long you allow writing to IPTC after you have deleted the embedded XMP, again an embedded XMP record is created next time you write metadata to the files and when IPTC is synced.

ps
use  "-IPTC:all=" to remove all IPTC data
regards,
Joerg

Mario

Is this glitch already reported as a bug? Link?
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

joel23

Quote from: Mario on June 22, 2014, 03:39:12 PM
Is this glitch already reported as a bug? Link?
Yes, it's the one I asked in the morning about if you have downloaded the logs already.
https://www.photools.com/community/index.php?topic=2506.0
regards,
Joerg

DaweP

Joerg,
thanks very much. These commands seem to work very well for my JPG files. But not so for my RAW (NEF) files (at least so far...). I don't know whether some other settings have to be changed... Writing IPTC and EXIF is already allowed for NEF file formats.
D.

Erik

You might want to be sure that your files aren't read only.  That's just a random guess.

Using the ExifTool Command Processor (ECP) is a bit independent of IMatch settings in that your settings for writing or not writing metadata don't matter.  It is a shell for running EXIFTool.  You could probably look at the EXIFToolGUI (a separate program) available on the internet for a more visual approach (in that you can see and delete metadata).  It might be worth it for one or two files.

I know I've had my own problems with metadata in transitioning to IM5, and the EXIFToolGUI is a good starting point.  Once I get a grasp on what I need to do and what is wrong, then I've been able to move back to IMatch and use the ECP as needed (it's faster for lots of files).

DaweP

Erik,
the files were not read only :)
Thanks for your tip concerning ExifToolGUI. Although I managed to "recover" the problematic files from my backup, your idea is worth to remember as it might be useful in the future.
D.