Keywords with Apostrophes

Started by erichaas, September 14, 2023, 01:20:18 AM

Previous topic - Next topic

erichaas

I had posted about this once before, but didn't have time to follow up on it then.

When I try to add a keyword with an apostrophe to a file, the pending write-back icon shows up, and will not go away when I write back the data.

It appears that apostrophes are being changed to ' somewhere along the line. So, in the keyword panel, the keyword will show as boardgame|Can't Stop, but when I examine the file with ExifTool, the keyword shows as boardgame|Can't Stop.

This problem occurs with ARW (Sony Raw) files, and if I add a keyword with an apostrophe to the ARW file, it will propagate the ' to all (JPG) version files as well. If instead, I add the keyword to a version file directly, all works as expected.

In Preferences, under Medadata, I have IPTC and EXIF Character Encoding set to default for both read and write.

Mario

#1
I cannot reproduce this. The "Don't stop" keyword is written to the XMP file for the ARW just fine.

But the MDA reports that your ARW files contain embedded XMP data and even legacy IPTC data.
Which of course makes things difficult since IMatch does not write XMP to ARW files, and legacy IPTC data is very sensitive to character set encoding issues (which is why it was abandoned 20 years ago). And ' may trigger character set encoding issues and ExifTool may try to save the day.

IMatch has no feature that replaces or 'escapes' characters in keywords, so this something ExifTool does.
Or, maybe the keyword was written that way by another software, and since IMatch does no replacement, it cannot sync the legacy IPTC keywords in the file with XMP (the XMP does not contain the '
The &#** is actually an encoding that is used by HTML, I believe. Not sure where this would come from...?

I recommend to remove the embedded XMP and legacy IPTC data in your ARW file using the corresponding presets in the IMatch ExifTool Command Processor: "Delete XMP Metadata" and "Delete Legacy IPTC Data" presets.
Then things will work as normal.

Having two sources of truth for metadata for a file (embedded XMP and the XMP in the sidecar file IMatch maintains) is never good. And legacy IPTC is now really a thing of the past.

If you need to have XMP and legacy IPTC data embedded in the ARW file, you will have to start experimenting with character set  encodings and maybe even force IMatch to not use an XMP sidecar file for ARW but instead embed XMP data in the ARW itself. Not recommended.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

erichaas

QuoteIMatch has no feature that replaces or 'escapes' characters in keywords, so this something ExifTool does.
Or, maybe the keyword was written that way by another software, and since IMatch does no replacement, it cannot sync the legacy IPTC keywords in the file with XMP (the XMP does not contain the '
The &#** is actually an encoding that is used by HTML, I believe. Not sure where this would come from...?
These files were downloaded from the camera and only edited with IMatch. There shouldn't be any legacy metadata other than what the camera itself embeds in the file. I added the keywords by typing them in the Keyword Panel (or selecting them from already existing keywords).



QuoteI recommend to remove the embedded XMP and legacy IPTC data in your ARW file using the corresponding presets in the IMatch ExifTool Command Processor: "Delete XMP Metadata" and "Delete Legacy IPTC Data" presets.
Then things will work as normal.
I tried that. No luck. "Delete XMP Metadata" reports "0 image files updated; 1 image files unchanged".



QuoteIf you need to have XMP and legacy IPTC data embedded in the ARW file, you will have to start experimenting with character set  encodings and maybe even force IMatch to not use an XMP sidecar file for ARW but instead embed XMP data in the ARW itself. Not recommended.
No, I don't need XMP or legacy IPTC embedded in the file. I'm not even sure where it came from.


While writing this message, I did a little more experimenting. The problem only seems to happen if I propagate the metadata to the versions. If I copy an ARW file to a folder by itself, I can write keywords with apostrophes with no problems. If I copy an ARW file, along with its versions, to a folder, then when I write a keyword with an apostrophe, I get the ' again.

Since I'm the only one reporting this problem, I don't expect you to spend much time on it. I could live with keywords without apostrophes.

Mario

I cannot reproduce this using versioning and propagation either.
Please provide sample images (original and version) for download or send them to support email address (with a link to this thread) and screen shots of the version settings you use.
My guess this is some very specific character set problem with existing metadata in your files.

QuoteI tried that. No luck. "Delete XMP Metadata" reports "0 image files updated; 1 image files unchanged".
In such cases, ExifTool usually also reports why the image was not updated. Problem encountered or no embedded XMP /  legacy IPTC data found. I have not seen an image yet, only the ExifTool and MDA output so I cannot really tell.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

sinus

I personally do not use apostrophes in my pictures, specialy not in keywords.
The same I do not with umlauts (äöü ...).

IMatch has not a problem with it, at least when I tested this some years ago. But although some sources says, it is not a problem on the net (internet, social media .../ also mails), I had here and there some problems with them.

Therefore I do simply not use them (though my last name has a ä).
Maybe you could think about, to replace the apostrophes with something safer or simply not use them.

I do not like, if I muse write my name (Hässig) with Haessig. 
But it is like it is. Just in the last week I take a flight to Tenerife ... well, I had to write my name as Haessig.
So, when sources say, Umlauts are no problem, hmmm, I do not believe them.
Best wishes from Switzerland! :-)
Markus

Mario

There are (usually) no problems. XMP uses the UTF-8 character set, which can encode every thinkable character using 1 to 3 bytes. 
Problems may arise when legacy IPTC data comes into the mix. Here it depends e.g. on the character set encoding set in the file (íf any) if and how non-ASCII characters like Umlauts or ' are written and read. And, of course, it depends on the writing and reading application.

I have successfully stored the keyword "Can't Stop" in files.

I have used it in a version setup where keywords, including "Can't Stop", were propagated without problems from the master to the version. I used a file relation setup which propagates selected XMP metadata, including keywords.

I've setup a version to contain legacy IPTC data to see if this causes issues.
The keyword "Can't Stop" was propagated from the master to the XMP in the version (hierarchical and flat keywords) and also into the legacy IPTC keywords tag.

That is all time time I'm willing to spend on this user-specific problem.
I can look again at this if I get a set of files (master and versions) and propagation settings which cause this effect.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook