IPTC migration issues

Started by Joe Austin, June 10, 2014, 02:52:08 PM

Previous topic - Next topic

Joe Austin

I am trying to sort through keyword metadata issues associated with bringing my 3.6 images into a new Imatch 5 database.

In 3.6 all of my images were cataloged using the IPTC editor for captions , keywords, etc.

When I ingest images from 3.6 into my 5 test database, in the metadata/keywords panel I get keywords in these fields (I'll assign them letters to save typing):

A - XMP::Lightroom\hierarchicalSubject\HierarchicalSubject
B - Composite\Keywords\Keywords
C - XMP::dc\subject\Subject
D - IPTC::ApplicationRecord\25\Keywords

My Metadata configuration has enabled 'Write hierachical Keywords', 'write path elements', and 'Don't replace existing heirarchical keywords'
My Metadata2 configuration is using all defaults (MWG compliance, defaults for specific file extensions, etc.)

All four of these fields have the same keywords in 5 after ingestion of a new image from the 3.6 database.



But, when trying to manipulate the keywords I find that:

When I try to delete all of the keywords (in the keywords panel) for an image, the keywords are deleted from A and C momentarily, but seconds later (I have immediate write-back enabled), the data that was in B and D is automatically restored to A and C.

If I delete some kewyords in the keywords panel, but not all, and commit the changes, then the change is reflected in A and C, and remains that way.  This however leaves me with a difference between the keywords in A/C and B/D.   If I add new keywords, they are added to A/C only, again leaving a difference between A/C and B/D.

If I enable 'Read/write IPTC' in the Metadata2-extensions section (Canon cr2 and crw) then fields B/D get updated in the same way as A/C.

I read this in the help file:
"When you prevent IMatch from updating existing IPTC/EXIF data in files, the XMP metadata record and the IPTC/EXIF records may end up containing different data. And this may cause IMatch to replace newer data in XMP with older data imported from IPTC/EXIF on import or re-scan operations. Usually you disable writing IPTC/EXIF only if the image has no IPTC/EXIF record, and when you have good reasons to do so."
But the default setting for cr2/crw files is to have IPTC read/write turned off.  ???

If I have IPTC records embedded in my Imatch3.6 images, does that mean I should turn IPTC read/write on for those raw extensions so that the data remains consistent across all keyowrds fields?
I was hoping to avoid writing to images themselves and keep all metatdata in xmp sidecars as much as possible once migrated to Imatch5.
Is my only other option to leave 'IPTC read/write' turned off in metadata2-file extensions, and accept that my A/C keywords fields may differ from my B/D fields, and that I cannot delete all keywords without causing the re-import of the original IPTC keywords?

I am not clear on how Imatch5 decides on when, or when not, to import  (IIM)IPTC records for updated images, but I assume that this conflict will exist wherever there is a legacy IPTC field that is duplicated in XMP.

As an experiment I found that if I remove the value from field D using ExifTool, then this problem vanishes (with IPTC read/write turned off in metadata2-cr2,crw) and fields B and C all fall in step with changes made to A.   I suppose this presents the option of purging all the embedded (IIM)IPTC keywords once the files are ingested, but that means writing to tens of thousands of images which I would rather avoid.

Have I missed something in my reading and observation?  Are the options I've laid out the only ones that I have?

Joe Austin

Well, after finding this post:

https://www.photools.com/community/index.php?topic=2025.msg12840#msg12840

it appears the preferred option is to use exiftool to remove the IPTC application data after the files have been ingested and xmp populated by the imported legacy IPTC data.   At least that's how I read it.

Does this need to be done in all of the tags in the IPTC application record  to prevent inconsistency with xmp data that will be modified later, or possible re-import of IPTC data if xmp data is removed?     In my testing the xmp/dc/description fleld does not seem to be refreshed by the IPTC/ApplicationRecord/Caption-Abstract field the way that keywords are.


Mario

In IMatch you only modify hierarchical keywords, which are a superset of all other keywords.
When you write back, IMatch copies hierarchical keywords into XMP-dc:Subject (regular XMP keywords) and also into IPTC:keywords. Composite keywords are a virtual tag produced by ExifTool in not always clear ways. So just ignore it.

When IMatch writes back keywords, it updates the hierarchical and regular keywords in XMP. It also updates the keywords in legacy IPTC. Then IMatch reads the data back from the file to update the database with the current metadata contents of the file (writing XMP and IPTC data may change more than just the written data, e.g. update digest information, EXIF data).

This all works well if IMatch is allowed to synch data across all metadata standards (EXIF, IPTC, XMP, GPS). IMatch applies the Metadata Working Group rules which specify how metadata has to be mapped between IPTC/EXIF and XMP.

The problem with re-appearing keywords (or other data) always happens under these conditions:

1. Your files have keywords in the legacy IPTC record
2. IMatch is not allowed to update the IPTC record on write-back

If IMatch updates only XMP but cannot update the existing IPTC record in the file, the data will be out-of-synch after the write back. When MWG compliance is on, IMatch will re-import keywords from IPTC into XMP. But the IPTC keywords are the old keywords still. And there is the problem.

Solution:

Allow IMatch to update and synch existing IPTC data in your files.

If this is not possible for some reason, delete the existing legacy IPTC data from your files. Your files then have only the XMP record (and maybe EXIF data). Please note that some XMP fields are synchronized between XMP and EXIF too, e.g. date and time information. You can run into the same problem as with IPTC keywords if your files have EXIF data, you change XMP fields which need to be synchronized between XMP and EXIF, but you don't allow IMatch to update the EXIF record in your files.

I know it's complicated. I explained it in great detail in the help. And most of the Metadata 2 options and the per file format  metadata options deal exactly with this problem. The cleanest workflow is to let IMatch do it's thing and to keep all metadata in synch. Removing the legacy IPTC metadata record is also a good idea, unless you deal with systems or clients which still require the old IPTC data.

All the rest is explained in the help.


Joe Austin

Thanks, Mario

So the option you would recommend is to override the default behavior in metadata2, and let Imatch write IPTC/EXIF to .cr2/crw files so that the data can be synchronized?  (This does seem to be the closest option to using the IPTC Writer in 3.6)

But the help file page on metadata2 configuration seems to strongly suggest keeping the defaults, and in the post I quoted above you noted the exiftool-IPTC removal option as 'better' ???

Mario

This is only needed when your CR2 files contain legacy IPTC data. Which they should not. Hence the default is off.

Joe Austin

So in the case where I am bringing in many files cataloged in 3.6 and containing legacy IPTC, but I don't wish to continue using embedded legacy IPTC in raw files in the future, would the best process be:

1.  Leave IPTC writing turned off in metadata2 for CR2 files.
2.  Ingest the old files from 3.6 into IMatch 5.
3.  Use ExifTool to remove the IPTC Application Record data.
4.  Use only xmp in the future for all new and old files.


Mario

Once IMatch has imported the files and created the XMP record, you can strip the IPTC.
Try with a set of test files and make sure that all your workflow is XMP-savvy before stripping the IPTC.

Or, if you have used IPTC in CR2 before, continue to do so and allow IMatch to write back to CR2 files. Personally, I would get rid of legacy IPTC.

Erik

Quote from: Joe Austin on June 12, 2014, 06:50:41 PM
So in the case where I am bringing in many files cataloged in 3.6 and containing legacy IPTC, but I don't wish to continue using embedded legacy IPTC in raw files in the future, would the best process be:

1.  Leave IPTC writing turned off in metadata2 for CR2 files.
2.  Ingest the old files from 3.6 into IMatch 5.
3.  Use ExifTool to remove the IPTC Application Record data.
4.  Use only xmp in the future for all new and old files.

From my own testing and experiences, make sure you've written out pending XMP updates before you delete the IPTC record from the file.

I would suggest testing this on a few files that you have backed up and using the EXIFTool processor to verify that the XMP record is what you expect and that the IPTC record is actually gone (not really an issue).  It's worth verifying as some times it can be easy to forget that metadata write back and end up losing all the metadata in a file, accidentally.

Joe Austin

Good caveats, and I am aware of them.  In fact, I am testing out exiftool commands now that will do the IPTC stripping on my test database/images and will post the command  here  when I have collected all the tags in the command and have it working the way I think it should.

I wonder how long it's going to take to strip the iPTC from 25,000 files.

Erik

Try it on small batches at first.

I've been doing something similar to remove some XMP records that have duplicate values in them (copying the XMP record, stripping it, and then rewriting) and it can take a while.

However, the longer part is after you exit the ECP.  IMatch needs to rescan the files because the modifications happen "outside" the db, which triggers a rescan.  That can take a similar amount of time as it does when you ingest files. 

In other words, it may take a bit of time for 25,000 files.

Ferdinand

Quote from: Erik on June 13, 2014, 05:07:37 AM
Try it on small batches at first.

+1.  I'd be interested in your experiences with putting such a large number of files through the ECP.  I found batches of around 500 files best, but that would be tedious for 25,000 files.  If I put more than 500 files through, you could see the ECP getting slower and slower.  I also tried the "run once for each file" option and it didn't really help, if anything it was worse.

mastodon

I don't migrated, just imported IPTC fileds to XMP. Than let Metadata Write-back. Unfortunatelly all "ő" characters became "?" even in the jpg-s. I have now changed IPTC field handling from default to UTF8, but nothing changed. What can be done?

Joe Austin

My testing database and image repository are on a separate drive from my production database.  Because of the struggles I have had in planning out my migration to Imatch 5, I am planning  to copy all of the images to the testing drive, import them into the testing database, and go through the whole configuration, IPTC stripping, version setup, etc.  to make sure it all works the way I want before I create a new database on my production drive and aim it at my real images.

Viscus

I have the same problem with over 9000 JPG Files. Some are copies from RAW Files. The only solution is now to remove the IPTC content for Keywords and sync back the information from the raw files.
And then i have the same to do for my older JPG Files from my Coolpix cam.  I needed some hours with try and error for my IPTC mess, also with reading here in the support forum and understand how things are working. Even the help is great, but there are a lot of things to understand.
And synching back the IPTC Records to the files is not a good idea. I can support Marios arguments.

I didn't found anywhere else how to remove every IPTC record, even a templated was also missing. I'm working with the german version of IM5. So it's a little bit harder to find the right button and option. After this i have to correct the timezone in the time fields.

So for others i used:

-overwrite_original_in_place
-IPTC=
{Files}