Problems with character encoding in 2.2.4

Started by emef, September 11, 2014, 10:17:59 PM

Previous topic - Next topic

emef

Since switching to the latest version I have problems with my accented characters (French forces;-)). Realizing a new reading of the data that the text display is made by "cabalistic" characters instead of accented letters, while I did not have this problem in previous versions of IM.      :-[
TIA.

Mario

Nothing has changed in this area for a very long time.
First make sure you have All character-set related settings to Default under Edit > Preferences > Metadata.
You did not specify which metadata tags are affected, how you get data into these tags (IMatch? Other Applications?), are these tags mapped to XMP from existing IPTC/EXIF data in your files etc. More info is required.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

emef

Mario, my sets of characters (since my first beta of IM5) placed in UTF8.

The affected text is placed in the "copyright", ie the one that appears in "Browser / Exif / Owner of copyright", "Browser / IPTC Application Record", "Browser / XMP Dublin Core / copyright", "Browser / XMP / XMP rights "and under" Browser / Copyrights / copyright "; as well as "Default / Description / Copyright" and under "Image Info /.../ Owner of copyright and / Copyright notices" and also "/ XMP / xmpright / usage terms /."

I always imports photos from Lightroom (current version 5.6), I apply import the text of copyright and I do not do anything else at this stage. Then I open IM. And by the way, I just noticed something else: when I open the metadata of a photo where I have not yet changed anything in IM, text appears as introduced in LR, ie the accented characters are well rendered and legible; if I record a "GPS" position or keywords (categories etc.) is writing metadata that this happens (I just re-testing at the moment, I place the red cross on the card and I click on the green "v" to apply the contact, nothing changes but the pencil appears and when I click on this pencil, the coordinates are recorded and my copyright text degrades).    :-[

Pardon me if I was a little long. 

emef

My metadata templates, if it helps     ;-)

[attachment deleted by admin]

Mario

Do your files contain legacy IPTC?
Is the legacy IPTC written in UTF8 and marked as such?
If not, ExifTool will assume your local code page when mapping to IPTC to XMP (unless you change that in IMatch).
For EXIF, the same applies. See my exhaustive comments in the corresponding help topic (Edit > Preferences > Metadata). Very important.

Please attach one of your files so I can see what it contains. Such character set issues are usually caused by character set problem in the legacy IPTC data contained in the file (written in local code page but marked as UTF8).
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

emef

I can post a nef having been treated with IM and another having only been treated with Lightroom ... but where?
In this forum there is a restriction on the size of the post, and each nef over 25 megs.

cytochrome

I had a lot of diacritic character problems but this is solved since several month. However from time to time I see some surfacing, especially the french éèçà, in folders I haven't touched  recently .

Almost always the IPTC UTF8 flag is not set in these files. When I set it (ECP, -overwrite_original_in_place / -codedcharacterset=utf8 / {File.FullName} ) the files get updated and the problem is solved.

I used legacy IPTC in these old files and let IM write/update IPTC.

Francis

emef

cytochrome: thank you for your response, but could you be more explicit in your solution, I tried by copying as is your syntax but I get an error under ECP.   :o
Thank you in advance.

cytochrome

Hello,

In the ECP make a preset "Convert to UTF-8" like this:

-overwrite_original_in_place
-codedcharacterset=utf8
{File.FullName}

select your files, check "Run for each file in selection, run and close.

The files should update and show correct character encoding. At least I hope so, works here almost always.

Francis

emef

Alas, at home this solution does not work. The result of this command multiplies the text, with its wrong encoding in each "frame" question.
While the "XMP xmpRight / Usage Terms" still has (and has always been) perfectly fine (even before the use of the "Convert to UTF-8" command).

See screenshot attached.
Thank you for your patience.

[attachment deleted by admin]

cytochrome

Ah, sorry to see this. Some time ago, towards the end of the beta period, I had a lot of trouble with diacritic characters. Part of the mess came from outside IMtach, I use Photomechanic for ingestion and filling the author and location metadata tags and was not aware it was not set to write Unicode.

I experimented a lot with the metadata IPTC character encoding (read and write) think the problem is that IM (well exiftool) reads latin 1 western european (what most programs output when in French mode) but on write back writes UTF-8 without ever setting the IPTC UTF-8 flag. On read back this gives gibberish.

I don't use LR so cannot comment further. The multiple write back may indicate that something is not logical in your metadata2 File format configuration options.

Francis

emef

Here is my setup for "Metadata 2", I do not quite understand what I would need to change.
Sure, I can go back corrupted files and iron in Lightroom to correct, I tried and it works, or change the way I work and my first enter datas in IM and then add the copyright text in LR.

If all this does not solve themselves, may be trying to reinstall an older version of IMatch (there was not this problem two versions ago).

[attachment deleted by admin]

Mario

QuoteI experimented a lot with the metadata IPTC character encoding (read and write) think the problem is that IM (well exiftool) reads latin 1 western european (what most programs output when in French mode) but on write back writes UTF-8 without ever setting the IPTC UTF-8 flag. On read back this gives gibberish.

Reasons for and solutions are explained in the help for Edit > Preferences > Metadata.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

cytochrome

Well, je donne ma langue au chat...

My settings are very similar to yours, only for xmp sidecars I use "Embed XMP in file". I do not use LR and normally don't set Copyright but I did it with two external application with Nikon's ViewNx2 and with Photomechanic 5. I wrote Francis éèçà and got exactly that after doing a rescan in IMatch. Of course the copyright is in the NEF and not a sidecar in my case.

Have you tried to set Allow to create IPTC/... to NO ?

Francis



[attachment deleted by admin]

Mario

Quote from: emef on September 13, 2014, 01:12:52 PM
If all this does not solve themselves, may be trying to reinstall an older version of IMatch (there was not this problem two versions ago).

1. You cannot downgrade. The database may have changed and older version of IMatch may not understand the new database format.

2. If your sample image is too large to updated (you can surely resize it in LR/PS) you can send it to me by email. See info in my signature. Please include a link to this topic in your email.

-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

emef

Mario, I just sent you (in two posts) a picture (converted to jpeg) and two files "xmp" (before and after change in IM, and a reminder of my settings "Metadata 2".

Mario

Thank you. Please be aware that I'm currently processing files, databases and DUMP files sent to me end of August. It can take some time until I get to your files.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

joel23

#17
@emef
following Cytochroms advice about the ECP (adding the IPTC:CodedCharacterSet tag, on which he is right!) solves the source of the problem, but not the result it produces, especially not when the file was edited many times already.
The already converted characters can't be reverted (as you might have noticed, additional characters were added every time you edited the file from © to Ã,© to ÂÃ,©, etc..)

Since it spreaded to XMP and EXIF already, IMHO the only solutions is to apply the Copyright again after the mentioned tag was created for IPTC.

ps
I launched a bug report today, 'cause this tag should be set whenever IPTC is created or updated.
regards,
Joerg

cytochrome

Yes indeed, should have told this... for me it is evident because I never could reverse it once it reached the state of ÂÃ,© etc. It reverses in simple cases of "first pass". Otherwise one has to start again.

As I explained in another post I set in my personal metadata panel this tag : {File.MD.IPTC::EnvelopeRecord\90\CodedCharacterSet\0} to see instantly if a file has or not the UTF-8 flag set in the IPTC and when not I set it with the ECP before writing any metadata in IMatch to that file.

Francis

[attachment deleted by admin]

joel23

Quote from: cytochrome on September 14, 2014, 10:36:41 PM
Yes indeed, should have told this... for me it is evident because I never could reverse it once it reached the state of ÂÃ,© etc. It reverses in simple cases of "first pass".
Never mind. But yepp, it might be corrected after it first appeared in the panel. If. IMHO it depends on the settings and if the corrupt data from IPTC was written to Exif and XMP already.




regards,
Joerg

emef

Quote from: joel23 on September 14, 2014, 09:36:45 PM
@emef
following Cytochroms advice about the ECP (adding the IPTC:CodedCharacterSet tag, on which he is right!) solves the source of the problem, but not the result it produces, especially not when the file was edited many times already.
The already converted characters can't be reverted (as you might have noticed, additional characters were added every time you edited the file from © to Ã,© to ÂÃ,©, etc..)

Since it spreaded to XMP and EXIF already, IMHO the only solutions is to apply the Copyright again after the mentioned tag was created for IPTC.

ps
I launched a bug report today, 'cause this tag should be set whenever IPTC is created or updated.

Thank you for your intervention. I also think having to go in Lightroom after cataloged my photos in IMatch, and only for re-injected the "keywords" in the right format for character encoding. This is only a discipline to take, I just hope it's the only field affected by this problem otherwise it would become too tedious.
But I will of course wait for the response of Mario following files I sent him by email.

Mario

I finally found the time (sorry) to look into this.
When I look at your file in the ECP, I get:

[IPTC]          Coded Character Set             : UTF8
[IPTC]          Envelope Record Version         : 4
[IPTC]          Copyright Notice                : (c) emef...Tous droits rÃÆ'Ã,©servÃÆ'Ã,©s sauf


which means that the IPTC in that file is marked as UTF-8 encoded, but apparently it is not. The Copyright notice contains nonsense data (looks like data that has been converted numerous times between different character sets, and wrongly. IMatch cannot "fix" this when importing the file, whatever character set you set to override the default handling of ExifTool. I tried various options, but all charset overrides return the same nonsense data.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

emef

Thank you Mario, for now I am content with a back and forth between Lightroom and IM5.    ::)