Figuring out "exactly" how multiple keywords are stored in metadata

Started by GrantRobertson, August 16, 2023, 01:45:59 AM

Previous topic - Next topic

GrantRobertson

"Exactly" is in quotes because I am really only looking for a certain level of detail. 

I'm learning about using keywords within IMatch. I generally learn best by digging around in the documentation and doing lots of experiments. I had already figured out the "List separator" character (or so I thought) and how to choose it in Preferences from the online help, here, as well as several other places. So, I just left the "List separator" set to be a semicolon (;).

After adding a few keywords to a test image, I confirmed that, indeed, the list separator character that shows up in the field in the "Metadata" panel is a semicolon. I then decided to run the "All Keywords" script in the "ExifTool Command Processor." Much to my surprise, ExifTool lists those keywords with a comma as a list separator. Naturally, I was wondering why ExifTool uses a different list separator character. 

After a bit more digging, I learned from this forum post, and this online help topic that these fields are actually "repeatable." If I interpret the meaning of "repeatable" correctly, in this context, this means that, rather than store the different keywords in a list within one single field in the metadata, that each keyword is stored in yet another copy of that metadata field, each copy of the field holding a different keyword. Five keywords = five copies of the field in the metadata. Is that correct?

If the above is true, then the choice of list separator character is kind of moot, because it is only used internally within IMatch (or whatever other program is reading that metadata). Naturally, one would not want to choose a list separator character that might actually show up in the data, but otherwise, my choice of list separator character for IMatch will not affect any other program accessing that data, because that list separator character (the semicolon) doesn't actually get written to the metadata of the image. At the same time, that means there would really not be any reason to change it either. 

GrantRobertson

Well, I found what I was looking for here. Apparently, EXIF and IPTC metadata can have multiple copies of metadata fields. IPTC has a special "Repeatable" designation for those fields that are allowed to be repeated. And they only hold one value per field. However, XMP metadata does not allow repeated metadata fields. Instead, it allows lists to be stored all in one string in one field. Get this, they use a comma as a list separator. 

So, in the end, it still doesn't matter which list separator character that we choose in IMatch. It just gets translated into whatever is required in the metadata field by ExifTool when it writes the data anyway.

As it turns out that website is actually a pretty good resource for information about metadata.Exif, IPTC & XMP metadata and ICC Profiles. It is the website for Exiv2, an open-source tool similar to ExifTool. The two most pertinent pages are the Metadata reference tables, and the Exiv2 utility manual, which has some explanations about why certain things are done certain ways.

sinus

Thanks for that.
Seems to be complicated stuff (at least for me), but it has a lot of information. 
Best wishes from Switzerland! :-)
Markus

Mario

Just let IMatch/ExifTool just do their thing ;)

IMatch has an excellent ranking from the IPTC (https://www.iptc.org/photo-metadata-support-test-results/?swid1=imatch) for conformity.

Legacy IPTC metadata (IIM) does not really matter that much anymore, it has been replaced by IPTC XMP 20 years ago. But is still maintained (while not created) by IMatch for compatibility reasons.
Keywords were never stored as a "string of text" with separators. It was always some form of repeatable tags or XMP bags.

The information on the ExifV site is quite good.
But also consider the original references, like the IPTC standard, the various XMP standards and bodies  and the jEIDA/JEITA/CIPA EXIF metadata documentation for EXIF. And that's just (mostly) for images.