Metadata issues with accented characters

Started by timag, December 30, 2016, 12:32:02 AM

Previous topic - Next topic

timag

I have an issue that is driving me crazy. Over the past year I've been adding files with location data that causes iMatch to behave strangely. I think I've figured out what is happening but am unsure why it is happening and how to stop it.


  • I move my photo's from memory card (CR2 from a Canon 7DmkII) to my harddisk
  • I add geo data with GeoSetter (either to add names if I had GPS turned on in camera or everything if i didn't)
  • I open iMatch and everything gets automatically imported when iMatch sees them
  • Here is where it gets funky: with the files that cause the issues iMatch reads the geo related keywords (say "Slovenië")
  • iMatch will not show "Slovenië" but will show something like "SloveniA&"
  • iMatch will then write back "SloveniA&" to the file
  • iMatch will not show "SloveniA&" but will show something like "SloveniA&#^"
  • iMatch will then write back the last keyword and keep going like this in a loop. It seems like it reads the metadata wrong and then, after it is in the database it sees it doesn't match what's in the file and tries to write back the database contents to correct that. It adds it as a new keyword. But on the next metadata read it misreads it again and the whole thing starts over

I've found files that had 15 MB worth of stuff appended to keywords because of this. So far I've managed to rescue all the files again.

All files where I have issues have keywords with accents in them.

Does anyone have a clue what is going on here? Do I have something configured wrong?

Mario

The usual case for wrong handling of metadata containing non-ASCII characters is legacy IPTC data using non UTF8-encoded data. EXIF/ITPC have no no notion for character sets. XMP is UTF-8 encoded and supports all languages. As far as I know, GeoSetter writes poper GPS and location data into XMP so this is usually never an issue.

EXIF has none at all. Legacy IPTC at least allows to specify whether it is encoded as UTF-8 or in a local character set. Whatever that local character set may have been at the time.
ExifTool automatically looks for this information and uses it during import. If this is not working, the metadata in your files is messed up.

You can tell IMatch to use a specific character set during import under Edit > Preferences >Metadata. Press <F1> while in this dialog for detailed info.

When you use the extensive features in IMatch to geo-code your files (Map Panel, Reverse geo-coding) do you see the same effect?

And, as always when a user reports problems with the metadata in a specific file, I need a sample file. I need to look at the actual metadata in the file to see what's going on. You can upload the file to your cloud space and post a link. Or send it to me via email (include a link back to this topic). See: https://www.photools.com/support.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

timag

Sorry for posting in the wrong area of the board!

Here is a link to an example CR2 and accompanying XMP file: https://1drv.ms/f/s!AqLf4pYSadd-llaNIvxLOpwIBlLk

If I look at the CR2 file there are 2 IPTC fields that contain accented characters. Those are the fields that seem to be causing the issues initially. iMatch is set to read IPTC as UTF-8 but that might be wrong. I think I changed it awhile ago to this setting while trying to fix this. I am not sure what the setting should be.

I will have a look at the builtin capability for reverse geocoding, I've had GeoSetter in my workflow for ages and haven't got around to changing my workflow. I will still have to fix this issue for the existing pictures first though...

Mario

QuoteiMatch is set to read IPTC as UTF-8 but that might be wrong. I think I changed it awhile ago to this setting while trying to fix this. I am not sure what the setting should be.

The default is "Default".

This means ExifTool decides. And ExifTool then checks if the IPTC is marked as UTF-8. If not, it interprets the character using the local character set. Which is correct in most cases, unless you write legacy IPTC data on a computer with a , for example, Russian locale and then send the files to somebody who runs a Japanese locale. In that case your friend in Japan has to force IMatch to interpret the legacy IPTC data using the Russion locale, not the Japanese.

If you have intentionally forced IMatch to interpret legacy IPTC data as UTF-8, but it was actually written using your locale character set, you are creating exactly the mess you are in now. The characters in the file are interpreted wrong, causing problems.

1. Set the setting to Default. Both read/write.
2. Make sure you have metadata protection set to Off (Metadata 2 > Protection: Protect unwritten data).
3. Select a few files, press Shift+Ctrl+F5 and use the "Reload Metadata" option to force a re-import of all metadata using the current settings.

I have imported your file using the default IMatch settings for metadata.
The relevant data is here:

Hierarchical Keywords   Donje Prekrizje; Gornje Prekrizje; HRV; Kroatië; Zagreb
City   Gornje Prekrizje
Location   Donje Prekrizje
ISO Country Code   HRV
Country   Kroatië
State/Province   Zagreb

This looks good to me.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

timag

Thanks a lot for having a look!

I did some more digging on my end, prompted by your earlier comments. I've found a setting in GeoSetter that was set incorrectly as well, making me think that I was importing UTF-8 only when I was not. Then when things weren't working as expected in iMatch it made me change the import setting for IPTC in iMatch to UTF-8, making things worse as you explained.

I've started on the process you described, I hope I can get everything fixed now!