Metadata Problem

Started by jmsantos, March 09, 2020, 09:39:23 AM

Previous topic - Next topic

jmsantos

I think it will be better to summarize the problem I am posing. To better understand the problem I will remember the discussion thread that I opened in May 2019:

https://www.photools.com/community/index.php?topic=9039.msg63466#msg63466

Quote from: jmsantos on May 15, 2019, 05:42:31 PM
Thank you for your reply, Mario.

Here is a file with embedded GPS data and Copyright and other data in XMP sidecar:
https://www.dropbox.com/s/lluatpbtna1fr90/20190510_131403_4698.zip?dl=0

It is a case of my particular collection, similar to others that I can find in the institutional archive that I will have to manage. Depending on the option that I activate in IMatch (Default or Favor XMP sidecar), I will be able to see the GPS data or the author and copyright data.  With IMatch I can't see both at the same time, but with LR I can.

In response to my other post you said:

Quote from: Mario on April 10, 2019, 10:18:43 AM
If your image collection has older images produced by older Adobe products or other software, you will need a software like IMatch to point out the problems and allow you fix them.
Getting the metadata straight and standard-conform is usually one of the biggest challenges when migrating to a true DAM software.

Well, I need to migrate to a true DAM and that's why I'm testing IMatch, but I can't get the metadata to be straight and standart-conform. What workflow should I follow in IMatch to resolve this, or what options should I enable? In the archive I will find many problems like this.

This was your answer:

Quote from: Mario on May 16, 2019, 12:14:12 PM
This is a quite sad example for metadata management.
The XMP sidecar file contains only a few tags:


[XMP-dc]        Title                           : 20190510_131403_4698
[XMP-dc]        Rights                          : © 2019 José Manuel Santos Madrid
[XMP-dc]        Creator                         : José Manuel Santos Madrid
[XMP-photoshop] Authors Position                : Fotógrafo
[XMP-xmpRights] Marked                          : True


This is a minimal XMP record, missing lots of tags.
And none of the tags written to the XMP record have been updated in the corresponding EXIF/IPTC/GPS records in the image itself.

When you use the default settings in IMatch, IMatch produces a new rich XMP record from the data embedded in the file. This XMP record then contains all EXIF fields, the GPS data and everything ExifTool can extract from the file and map them to XMP by applying the MWG rules. But this ignores the partial data in the sidecar file.

If you tell IMatch to favor the existing XMP sidecar file, IMatch will import metadata from the file (EXIF, GPS, legacy IPTC) but not perform any EXIF/IPTC/GPS mapping on its own if an XMP sidecar file already exists (which is the case). This means that the XMP data imported by IMatch contains everything that is in your sidecar file, which is only little.

This is how IMatch handles this situation right now.
by default it merges XMP data it has created from the original file with the XMP data in the sidecar (if exists).
If  you set "favor sidecar" it uses the sidecar XMP data and does not create its own XMP record.

What we would need to handle your use case is a feature to a) Produce a rich XMP record from the existing data in the file, apply all MWG mappings to map EXIF/GPS/IPTC from the file into the XMP record. And then, if an XMP sidecar file exists, merge its contents selectively with the rich XMP data produced by IMatch. Such a feature currently does not exist.
I have some ideas in that direction for IMatch 2020, though. Maybe even earlier.

This requires a lot of testing, because there are many fringe cases and metadata handling has become more and more complex over time, many options have been added to handle fringe cases and user-specific workflows. Camera vendors have started to add partial XMP records to RAW files etc. I need to give this a re-think for IMatch 2020.

Now I have raised the question several posts above, in case Imatch 2020 had solved this issue. This post is related to that:

Quote from: jmsantos on March 06, 2020, 08:15:14 PM
Quote from: Mario on March 05, 2020, 09:23:08 AM
Can you run the Metadata Analyst app on one of your files so we see what they contain?
When I understand the old thread correctly your files have a mix of XMP and native metadata, in wrong places and out-of-sync?

Yes, I have run Metadata Analyst, although it reports something I already know.
I think it would be necessary to implement a tool to combine the metadata embedded in the file (such as GPS data) and the data in the XMP file. Or already exists in Imatch? I don't know.

I do not understand your answer:

Quote from: Mario on March 07, 2020, 08:51:35 AM
Quotecombine the metadata embedded in the file (such as GPS data)
IMatch imports native GPS data into XMP during ingest. So the GPS data you see in IMatch is what's in the file. Unless you have modified the GPS data in IMatch but not written back.

Sorry for the length, but I think the conversation was a little sparse in this great thread.

Mario

It makes no sense to discuss all sorts of topic, including metadata problem threads reaching back to 2019, in a catch all thread about IMatch 2020.
It is already impossible to follow the multiple sideline discussions in that thread, and appending more and more instead of opening a new thread makes no sense.

I have split your post into a new thread so I can look at it again when the 2020 frenzy is over.

jmsantos

I waited a bit to check if this topic was taken up again. I don't know if the IMatch 2020 frenzy has passed.

In any case, it is true that the original metadata query dates back to 2019. However, at that time you answered: "I have some ideas in that direction for IMatch 2020" and "I need to give this a re-think for IMatch 2020 ".

Have you developed those ideas in IM 2020? Is this matter postponed sine die?

Thanks.

Mario

Still on my look-into list. Not forgotten!
But apparently a pretty peculiar problem and currently I'm looking primarily into problems affecting more than one user.