IPTC2 block in some files leads to keyword problems

Started by Ferdinand, May 14, 2014, 04:05:39 PM

Previous topic - Next topic

Ferdinand

This is probably more an ExifTool question, but I'll ask it here first as I'm going to try to use IMatch to clean it up

I've finally migrated by production DB to V5.  A big job, as I had to clean up the keywords first.  The idea was not to have any pending writebacks after converting the database.

However I had 1,500 files with writebacks.  How was this possible after all the testing?

Well, for some of them it's possible because these files have two blocks - IPTC and IPTC2.  The good data is in IPTC2, but it seems that ExifTool is feeding both to IMatch.  I've no idea how this happened.

I found a thread here about how to fix this, but so far no luck:
http://u88.n24.queensu.ca/exiftool/forum/index.php?topic=2516.0

I'm going away to think about how best to fix this.  Anyone else had this problem?

[repeat after me - metadata is a mess!]

Mario

Interesting. Can you send me a few sample files for my test collection?

ET folds the data in this (rather unusual case) but I cannot say what folds where and which fields/records get precedence.
Multiple IPTC records were used if the individual record exceeded the 64K boundary, e.g. in JPEG files.

Ferdinand

I've just uploaded a sample file to the ftp server.  There's just one jpg - the others all seem to be similar.  If you want more then you'd better scream soon, as I'm about to fix them.

You can see for yourself, but there seems to be two complete IPTC blocks, rather than the second carrying over from the first past the 64k limit.  It doesn't look like there's enough metadata there to break the 64k limit, and a complete ExifTool dump is only about 16k.  The first IPTC block has outdated/garbage data, whereas the most recent data is in IPTC2.

I think what I'll do to clean this up is to wipe the IPTC block completely and repropagate from the master, which doesn't seem to have this problem.  I have written a little propagation script to tide me over until that orientation issue is sorted out.

IMatch 3.6 tends to hide all manner of metadata issues, mostly caused by inconsistent user practice.  IMatch 5 on the other hand not only shines a bright light on them, it shoves them in your face.  All 1,500 of them.  Only a few seem to be cause by IPTC2, which wasn't really my fault that I can see.  But the rest of them ...   :-[