what is the difference in applying and copying keywords?

Started by joel23, June 22, 2014, 02:18:44 PM

Previous topic - Next topic

joel23

IPTC character encoding is set to UTF-8 for both, read and write.
When applying keywords via the panel all IPTC keywords with umlauts are okay.
When I copy them from one CR2 to another words with umlauts are corrupted (IPTC and from there to the other XMP tags as new keyword)
See the attachment - the words "Länder" and "Säugetiere"

ps
when IPTC character encoding is set to "Default" I get the keywords like this (both forms):

Länder, Länder, Säugetiere, Säugetiere,

regardless if no Iptc:CodedCharacterSet record exist or if it's set to UTF8

IMatch log entry always is:
ExifTool: Using code page 1252




[attachment deleted by admin]
regards,
Joerg

Mario

Why did you change the default encoding?
If your files don't contain IPTC data in UTF-8 you now create a mix of ANSI and UTF-8, depending if only some of the data is written or all.
Usually ExifTool is good in detecting the proper character encoding and should not be forced to override it.

When you change keywords, ExifTool writes only the keywords to the IPTC record. This means that whatever character set the rest of the IPTC record is, the new data must be written in the same character set. Or you end up with a mix of UTF-8 encoded and ANSI data in the IPTC record. Impossible to repair.

When you copy IPTC data between files, the same rules apply. You did not mention how you copy, but ExifTool/IMatch usually never replace the entire IPTC, just update it To change the character encoding the record needs to be re-written and copied. There are samples in the ExifTool FAQ how to do it. If you must.

joel23

Quote from: Mario on June 22, 2014, 07:59:44 PM
Why did you change the default encoding?
To force UTF-8. To be honest: I believe it is more trustworthy to use force (in this case)
Quote
If your files don't contain IPTC data in UTF-8 you now create a mix of ANSI and UTF-8, depending if only some of the data is written or all.
I know, but they do contain IPTC in UTF-8. I didn't convert my IM 3 DB, but started from scratch, so IM 5 should have taken care to the IPTC records which it creates for RAW, even or better especially because I used this settings from the very beginning.

But it seems I wasn't precise as I should have been in my OP: in general I am talking about fresh generated IPTC records, generated by IMatch for RAW.

QuoteWhen you copy IPTC data between files, the same rules apply. You did not mention how you copy, but ExifTool/IMatch usually never replace the entire IPTC, just update it To change the character encoding the record needs to be re-written and copied. There are samples in the ExifTool FAQ how to do it. If you must.
Okay. All of my files are fine, especially those who have a CodedCharacterSetTag written by PS (JPG, PSD, TIF) - I just noticed this when I tried to copy keywords once.

But here's how to reproduce it:

Set IPTC character encoding to UTF-8 for read and write. Import a fresh CR2, let IMatch generate XMP (sidecar) and also IPTC.  Seems IMatch generates the IPTC record in UTF-8, but it does not write a CodedCharacterSetTag (as it should do per MWG as I told earlier)

Take a JPG with a CodedCharacterSetTag set to UTF-8 (since IMatch does not create a CodedCharacterSetTag take one created by PS or saved by Geosetter) Make a copy, remove the CodedCharacterSetTag from the latter and import both. After import mark them as read only, just in case.

Now copy the keywords (CTRL-C -> CTRL-SHIFT-V) from the second JPG (no CodedCharacterSetTag) to the CR2: all good.
Here a conversion might have happened already for the source file on import - not sure, but not really think so. Looks like UTF-8 when looking at it via ET GUI. Anyway, for the following this IMHO is not that important.

Again start with a fresh CR2. After ingesting etc. have been done, copy the keywords from the 1st JPG which has its CodedCharacterSetTag set to UTF-8 to the CR2.
This is where I get characters like "TORuZGVy" ("Länder" in the source file) and "U+R1Z2V0aWVyZQ==" ("Säugetiere" in the source file) for the CR2. What ever charset that might be.
This for sure is not a simple ASCII/ANSI <-> UTF-8 conversion problem. Because for an "ä" we would get or an ä in the other direction.  Mentioning and high-lighten this because you haven't had a look to the attachment in my last post yet.

Here you said:
QuoteThere is no CodedCharacterSetTag in the IPTC record, which means that it is encoded in a local code page and must be read a such.

I believe here is a huge discrepancy. You told, respectively the help files tells "default" IPTC character encoding settings always means IM writes UTF-8. But IMatch does not write the CodedCharacterSetTag; so what does it write and what does it assume later?  If it writes UTF-8 (per default or forced UTF-8 settings) but misses to create a CodedCharacterSetTag, does IMatch assume IPTC is in local CP next time it touches IPTC?

Anyway. This is not what I'd expect when copying metadata with forced UTF-8 settings for IPTC, when the IPTC record has been created under the same condition.

BTW: the batch processor also don't write an IPTC:CodedCharacterSetTag.
regards,
Joerg