Purging all User Metadata from embedded XMP/IPTC and putting them into Sidecars

Started by Lukas52, April 07, 2025, 04:58:15 PM

Previous topic - Next topic

Lukas52

Hi,

i am (finally) switching away from my ancient photo library to something more modern.
Joy of joys, my new service supports XMP! This means no need to write back XMP Tags into IPTC to support outdated software and FINALLY i can use sidecar files.
To me that's huge since my tags do change from time to time and my backup solution is based on checksums, which get invalidated every time the file gets changed (including metadata).

What i want to do now is remove as much metadata as possible from my files, only keeping basics like resolution etc. and put everything else into a sidecar file. That way, when something changes only the sidecar files would get written into my backup reducing backup time and bulk a lot.

I have already figured out that i can configure Imatch to create the sidecar files in the File format configuration in Metadata 2, but is there a easy way to remove all the already written Metadata from my Files?

I would also like to avoid writing duplicate Metadata into IPTC and XMP. I used to do that on purpose, since my old library doesn't support XMP, but now it's just bloat. I think there is a option for that somewhere, but can't quite remember...


Mario

QuoteWhat i want to do now is remove as much metadata as possible from my files, only keeping basics like resolution etc. and put everything else into a sidecar file.
That's not a "decision" users make.

The XMP standard, the IPTC and industry standards define where metadata is stored.
EXIF, IPTC, GPS and legacy IIM3 IPTC go into the image file.
XMP goes into the image file for formats like JPG, PNG, TIF, GIF, DNG, PSD, PSB, MP4, MOV and some others.
All other file formats, including RAW formats, use XMP in sidecar files.


Quotehave already figured out that i can configure Imatch to create the sidecar files in the File format configuration in Metadata 2, but is there a easy way to remove all the already written Metadata from my Files?

DO NOT, I repeat, DO NOT fiddle with file format settings, XMP storage options and other things. These are not supported by me and exist only for some industrial and governmental users who need these for very specific purposes, in controlled environments.

QuoteI would also like to avoid writing duplicate Metadata into IPTC and XMP.
Why would you willfully break established industry standards?

Trying such things can have severe consequences, including unavailability of metadata in other applications and services, data loss during re-import into IMatch and similar.

Why not just stick to the standards, do what works and let IMatch and ExifTool take care for metadata?
In the past, users have tried such, eh, "custom" approaches to metadata storage and all regretted it. And I gleefully let them hang with their problems, because I told them that I don't support this and that breaking standard compliance will hurt them.

The only thing that can make sense for private users without specific requirements of clients and agencies is to strip legacy IPTC metadata (IIM3, not the XMP variant!) from image files. This avoids issues like clipped text contents and character set issues. The ECP has a preset for that.

Lukas52

Literally every media hosting service will strip all embedded Metadata from files (or just ignore them after they have been read) since handling a changing file just because you added a tag is a pain. Especially thanks to tags like "derived from IDs" that even give identical copies different checksums...

Quote from: Mario on April 07, 2025, 06:23:13 PMThat's not a "decision" users make.

The XMP standard, the IPTC and industry standards define where metadata is stored.
EXIF, IPTC, GPS and legacy IIM3 IPTC go into the image file.
XMP goes into the image file for formats like JPG, PNG, TIF, GIF, DNG, PSD, PSB, MP4, MOV and some others.
All other file formats, including RAW formats, use XMP in sidecar files.

All of those pretty Metadata standards are really meant for archiving (or at least that's how it feels like), they are a pain to work with from a Filesystem perspective...

Sidecar files are a compromise (with downsides of course) but i don't get why this should be a problem.

Rewriting a 400 GB MP4 just to add a tag. Yeah.

Quote from: Mario on April 07, 2025, 06:23:13 PMWhy would you willfully break established industry standards?

I doubt having the same tags written 4 times in the same file is industry standard. It only does that with my Tags tho, everything else is written only as XMP. I remember making that change on purpose since my old web library only supported IPTC Keywords from a specific field that is considered legacy since more then a decade ago. I just can't remember what exactly i did :)
Fiddling with what gets written and what doesn't in terms of Metadata field is optional. The main thing i care about is having a Media file that does not change its filesystem checksum every time its metadata changes.

Quote from: Mario on April 07, 2025, 06:23:13 PMDO NOT, I repeat, DO NOT fiddle with file format settings, XMP storage options and other things. These are not supported by me and exist only for some industrial and governmental users who need these for very specific purposes, in controlled environments.
Technically i am a user in a controlled environment. Those tags are only for my new web gallery. Inside the IMatch DB things can stay as standard conform as one can be.

jch2103

Quote from: Mario on April 07, 2025, 06:23:13 PMThe only thing that can make sense for private users without specific requirements of clients and agencies is to strip legacy IPTC metadata (IIM3, not the XMP variant!) from image files. This avoids issues like clipped text contents and character set issues. The ECP has a preset for that.
I think this may have been one of the main things that the OP was thinking about (see OP's last paragraph). The ECP preset does indeed address this. The OP should definitely heed Mario's warnings, though.

Although 'bloat' may be a concern, metadata generally takes up very little space in image files/sidecar files. The important thing to watch out for is keeping consistency with metadata standards and trying to avoid internally conflicting metadata (e.g. avoiding legacy IPTC metadata).
John

Mario

Do as you please. But don't come and ask me for support when metadata stuff breaks. I won't support it.

QuoteRewriting a 400 GB MP4 just to add a tag. Yeah.

I see no problem with this. I just wrote back a 590 MB MP4 file on a spinning disk and ExifTool took 3.2 seconds and IMatch another 1.5 seconds for preparing and re-ingesting the file to update the database.

4.7 seconds for a ~600 MB MP4 file on a 5 year old computer is quite tolerable, I suppose.
How often do you have to do this, anyway? You process the video, add metadata, write back and archive it.

Lukas52

Quote from: Mario on April 07, 2025, 07:11:07 PMI suppose. How often do you have to do this, anyway? You process the video, add metadata, write back and archive it.
In my case just changing a tag name is 8 TB of Data that now needs to be rewritten, which in term triggers a backup, which causes 8 TB of additional space to be taken up, even tho all i did is fix a typo.
I plan on changing some older more rigid tagging structures into something that works better. 27 TB of Data. Fun. So now that i have to do it anyway i was hoping i can finally get rid of embedded Metadata :(
All i need is for IMatch to be working properly without changing my Files :)

I understand that from a archival perspective sticking to the guidelines makes total sense, but no one on the IT Infrastructure side ever does, cause again, its a massive pain. The amount of effort that results from a tiny change is just to much. Just think about Data-center usage. Every Byte Written costs electricity, SSD wear and processing time. You only need to comply with standards when you are pushing things back to the user or into archive.


Mario

I doubt that this is a problem many uses will ever face.
Changing metadata after a file has been archived is really rare. Edit, add metadata, archive.

Quotewhich in term triggers a backup, which causes 8 TB of additional space to be taken up,
Wow! What kind of backup do you use? Full-file backups only?

Using a modern backup software like Macrium Reflect or True Image backups only the actually changed sectors (512 byte blocks) for differential and incremental backups.

When I change, for example, a 2 GB IMatch database file, the incremental backup needs a couple of megabytes, tops.
A differential backup records all changes done since the initial 2GB backup. If I change 5 million bytes of the file on each of 10 successive days, the incremental backup is about 60 MB.  Otherwise I would end up with 20 GB of backup for the same file.

My weekly "full backup" volume is about 3 TB (compressed from about 5 TB of data). Incremental backups maybe 20 to 50 MB a day, depending on how busy I was.

Lukas52

Both Backup solutions you mentioned use proprietary backup formats. And rely on Windows. Both things i don't do. Also i doubt they use Blocks that small. 4KB has been the standard even on Windows for quite a while now. But that doesn't really matter anyways.

I do use Rsync and local snapshots, but i do work with entire files on my backup for best possible compatibility.

Just so you understand where I'm coming from: My job is make huge projects as efficient as possible. I optimize Server farms to use less energy, last longer and at the end make more money that way. I do the same for commercial buildings. Efficiency in all things. Sometimes (especially for my own little Projects) that urge to find the perfect solution sometimes escalates a little.

I feel like this is getting off topic a bit and i don't want to have an argument about Metadata standards, since i do feel like they have every reason the be and exist the way they do. They just don't fit my use case :) Im also not that knowledgeable when it comes to End user Software in general, since that's just not my cup of tea. 

To get back on topic:

1. IMatch uses xmp sidecar files by default for other formats, so why could using them for jpg or mp4 be a problem?

2. And just to figure out if i changed something in IMatch or not, where would i expect my Tags to be written? What fields are the "default" that IMatch needs to work properly?

I appreciate the fact that you are hanging around for this long even tho you made it clear there is no official support for this. I'm just trying to figure out what the problem is :)

Mario


Quote1. IMatch uses xmp sidecar files by default for other formats, so why could using them for jpg or mp4 be a problem? 
Because that's non-standard.
Also note that many XMP fields are linked to the native EXIF/GPS/IFD data in your JPG file, and IMatch has to keep XMP and the native metadata in sync anyway - else you would have two sources of truth. Forcing IMatch to use external XMP for JPG will do nothing regarding your backup volume.

Quote2. And just to figure out if i changed something in IMatch or not, where would i expect my Tags to be written? What fields are the "default" that IMatch needs to work properly?
Don't follow.
IMatch knows about 15,000 tags. XMP has several thousand tags in namespaces like IPTCCore, IPTCExit, Dublin Core, photoshop, to name only the most commonly used ones. There are also many tags which exist in different namespaces and must be synched. Changing XMP data also requires checksums (digests) and timestams to change.

Lukas52

Quote from: Mario on April 08, 2025, 10:16:54 AMBecause that's non-standard.
Also note that many XMP fields are linked to the native EXIF/GPS/IFD data in your JPG file, and IMatch has to keep XMP and the native metadata in sync anyway - else you would have two sources of truth. Forcing IMatch to use external XMP for JPG will do nothing regarding your backup volume.

The things that are changing about my Metadata are only things like Description and Keywords, possible Person in Image and Face Tags. None of those should have EXIF counterparts?

EDIT: Turns out Description does have an EXIF Counterpart... That kinda ruins my plans... Do .mp4/.png have EXIF Data as well? Its the most annoying an them to be honest.

Quote from: Mario on April 08, 2025, 10:16:54 AMDon't follow.
IMatch knows about 15,000 tags. XMP has several thousand tags in namespaces like IPTCCore, IPTCExit, Dublin Core, photoshop, to name only the most commonly used ones. There are also many tags which exist in different namespaces and must be synched. Changing XMP data also requires checksums (digests) and timestams to change.
Currently all my Keywords go to:
XMP Adobe Lightroom\Hierarchical Keywords
IPTC ApplicationRecord\Keywords
XMP Dublin Core\Keywords

Is this expected/default behavior?
Its even more chaotic for the Description. Thats written into
EXIF:ImageDescription
EXIF:UserComment
Xmp:Description
Xmp:UserComment
Xmp:ImageDescription
Iptc:Caption-Abstract

I suspect i enabled some sort of backwards compatibility at some point that causes XMP Fields to also get written back into IPTC Fields.

Mario

Description is mapped to EXIF UserComment.

QuoteDo .mp4/.png have EXIF Data as well?
Some do. Some camera vendors (smart phones) write EXIF into MP4. Regarding PNG, I don't think so. PNG has it's own way to store metadata and can use embedded XMP.


QuoteCurrently all my Keywords go to: 
Is this expected/default behavior?
Yes.


QuoteI suspect i enabled some sort of backwards compatibility at some point that causes XMP Fields to also get written back into IPTC Fields.
Precisely. It is required to avoid having multiple sources of truth.
Timestamps, descriptions, keywords, GPS coordinates should not differ between EXIF / IPTC / GPS and XMP. Else a software that does not support XMP (many!) will see e.g. a different description or date and time than a software that does. Allowing metadata to get out-of-sync is a telltale sign for sub-par metadata handling.

EXIF may contains a lot of data not available in XMP, especially camera-specific maker notes). Which are useful and should be maintained.

Legacy IPTC data should be maintained and updated if it exists, but no longer created. The last point has been argued a lot between the IPTC and the late Metadata Working Group and industry giants like Adobe. I think it's good to let it die. It was created 30 or more years ago and that were different times. IPTC has replaced IIM3 IPTC with XMP 20 years ago for a good reason.

IMatch and ExifTool take care for all of that automatically. Most of the expert settings only visible when Expert Mode is enabled exist only to allow certain users of IMatch to do specific things. 99% of the user base should never change any of these settings , which is why they are hidden by default.

I recommend Metadata for Beginners in the IMatch help.

Lukas52

Quote from: Mario on April 08, 2025, 11:09:32 AMPrecisely. It is required to avoid having multiple sources of truth.
Can't i just tell IMatch its Database is the source of Truth and everything else can get discarded?

IMatch is the only software i use to manage Metadata, everything else is read only.

You may not like this, but i also found out i can just disable IMatch from writing EXIF/IPTC to begin with. Not sure if that is a smart idea tho, since i don't fully understand how IMatch reads the Metadata back from a File.

All i need is IMatch to be able to work with this Data.

Mario