Cyrillic in IPTC keywords

Started by mosdubindi1, January 04, 2024, 06:20:14 PM

Previous topic - Next topic

mosdubindi1

Hello Mario,

Sorry for disturbing, but I'm stuck with the problem, could you pls. help? I'm trying to manage some keywords in Cyrillic. Hierarchical and XMP keywords are fine, no problem with Cyrillic after write-back. However Cyrillic in IPTC is replaced with scrap '???', mistake about different keyword in IPTC and XMP appears, and yellow pencil remains active. In Metadata2 there are Default encoding for read and write.

I was trying to delete IPTC legacy metadata by ExifTool Command processor, but after write-back the problem returns.

So I cannot get rid or ignore the scrap in IPTC keyword, because due to mistake, write-back is always required for that file, which is not acceptable.

Is it possible to manage Cyrillic in keywords? For me it is acceptable to have only Hierarchical and XMP keywords with Cyrillic and nothing in IPTC, it is also fine if all three Hierarchical,XMP and IPTC keywords are aligned.

Thanks in advance,
Dmitiy

Mario

Special characters and non-ASCII languages was one reason IPTC was aborted 20 years ago in favor of XMP.
The problem was when the application that wrote the IPTC did not specify the character set and the metadata was processed on a computer with a different character set / locale, the result was usually junk data...

Deleting the legacy IPTC data should have fixed this, since it will remove the source of confusion. Where should the junk come from when there is no legacy IPTC data anymore?
XMP uses UTF-8 encoding and this allows it to deal with all languages in use.

The MDA also reports a mismatch between IPTC and XMP keywords, which will further add to the confusion.
What file format is this? Since there is an XMP sidecar file, it is a RAW file?

As usual, when a user reports problems with the metadata in a file, it helps greatly when he uploads the file somewhere and posts a link. All else is just guesswork. Also include the XMP file.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mosdubindi1

Thank you Mario,

Sorry for not providing the file, you are right it is RAW file. Here are the links:
Link to the file:
https://drive.google.com/file/d/1lGS1ESmQHpwPfOk7IhiWIK_foH_CgaIk/view?usp=drive_link
Link to the sidecar XMP:
https://drive.google.com/file/d/1Dec7JI2Dmbus-lLn17KzgeznuOptuzZX/view?usp=drive_link

I've added more Cyrillic fields and make one more write-back. Situation with Cyrillic keywords became worse, looks like junk goes from IPTC to XMP and back.

Before write-back:
[IPTC]          Keywords                        : Russian Federation, Moscow Region, Ramenskoe District, Malakhovo, The Church of Demetrius of Thessalonica, ????
[XMP-dc]        Subject                        : Russian Federation, Moscow Region, Ramenskoe District, Malakhovo, The Church of Demetrius of Thessalonica, Зима

After write-back:
[XMP-dc]        Subject                        : Russian Federation, Moscow Region, Ramenskoe District, Malakhovo, The Church of Demetrius of Thessalonica, ????, Russian Federation, Moscow Region, Ramenskoe District, Malakhovo, The Church of Demetrius of Thessalonica, Зима
[IPTC]          Keywords                        : Russian Federation, Moscow Region, Ramenskoe District, Malakhovo, The Church of Demetrius of Thessalonica, ????, Russian Federation, Moscow Region, Ramenskoe District, Malakhovo, The Church of Demetrius of Thessalonica, ????

And yellow pencil is active (see attached).

As for other fields with Cyrillic:
{File.MD.XMP::photoshop\Headline\Headline\0}
  • XMP-photoshop is OK.
  • IPTC-headline is NOT OK -> junk. It is not giving red error.
{File.MD.XMP::dc\description\Description\0}
  • XMP-dc Description is OK.
  • XMP-exif User comment is OK.
  • XMP-tiff Image description is OK.
  • IFD0 Image description is OK.
  • ExifIFD User comment  is OK.
  • IPTC Caption-Abstract is NOT OK -> junk.  It is not giving red error.
{File.MD.XMP::iptcCore\AltTextAccessibility\AltTextAccessibility\0} - no problem as it is only XMP.
{File.MD.XMP::iptcCore\ExtDescrAccessibility\ExtDescrAccessibility\0} - no problem as it is only XMP.

So finally only keyword fields are problematic. After deleting Legacy IPTC (IIM) the error disappear but after write-back it comes back. If IPTC keyword could be deleted and no write-back is required for me it would be OK.

Thanks,
Dmitrij 


Mario

QuoteSo finally only keyword fields are problematic. After deleting Legacy IPTC (IIM) the error disappear but after write-back it comes back.
IMatch only writes legacy IPTC when the file already has IPTC. It will not (should not) write legacy IPTC after you have removed the legacy IPTC data. I need to verify this with your image files.

And then the character set configured under Edit > Preferences > Metadata comes into play. If there is no character set tag in the legacy IPTC record, ExifTool assumes ASCII, unless configured otherwise. See help for details.

I already see that the RAW has (some embedded XMP) and you also have a XMP sidecar file and legacy IPTC data.
Which camera or software produced this mess? Something from Nikon perhaps?

The links to Google you have posted are not public, Google wants me to log in or to create an account. Which I won't do, obviously. Can you make the two links public so I can download them?
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mosdubindi1

Thanks Mario,

I've made the links public, could you pls. try again?

Camera is Fujifilm, there is no any software as I'm working with RAW file directly in IMatch, and the mess with keywording most probably is not camera related. Other warning messages I do not care so far really.

Interesting thing is that when I convert another RAW file into TIFF, copy-paste metadata from RAW and take a look at metadata in Photoshop -> File -> File info, than notwithstanding junk in IPTC shown in ExifTool Command Processor for TIFF, data are shown OK, no junk. 

I've repeated the same exercise with TIFF - delete IPTC by using preset "Delete legacy IPTC (IIM) metadata". Metadata have been deleted, however IPTC keyword has not been removed from IMatch, and after write-back the mess with IPTC keywords come back, yellow pencil is active etc.

mosdubindi1

Hello Mario,

You can disregard my previous message.

I've checked again and initially there are no IPTC data in the RAW file. But when adding keyword, all 3 tags are filled immediately including IPTC:
{File.MD.XMP::Lightroom\hierarchicalSubject\HierarchicalSubject\0}
{File.MD.XMP::dc\subject\Subject\0}
{File.MD.IPTC::ApplicationRecord\25\Keywords\0}

Then write-back became active, and after performing it, the whole IPTC section pop up in metadata, all this Cyrillic mess starts and deletion of IPTC does not help. If no Cyrillic then I guess no problem having IPTC data.

Maybe it is possible not to fill tag {File.MD.IPTC::ApplicationRecord\25\Keywords\0} during keyword creation, so it is not triggering IPTC data creation? For those who need IPTC metadata, somehow to foresee separate management of IPTC keywords? Before new version of IMatch when I add keyword, IPTC tag remains empty. 

By the way I've created keywords by 2 options - by template (see attached) from Location Shown data and manually from Metadata Panel - the result is the same.

Mario

Before you do delete the legacy IPTC data, make sure all protection settings in Edit > Preferences > Metadata 2 are set to No.
After deleting the legacy IPTC data in the file, you you have to force an update of the Metadata using Shift+Ctrl+F5 > Reload Metadata, just in case.

Otherwise the previously imported IPTC header and the internally linked keywords will be retained, causing IMatch to re-create legacy IPTC data during write-back. Keywords are special in that regard and protecting XMP will also "protect" the legacy IPTC keywords.

Then mark the keywords in the Keyword Panel as modified (click the pen in front of the input field) and write back. Only XMP data is written to the file, no legacy IPTC data is re-created.

When you're done with this set of files, you may want to re-enable the default protection settings which are Yes, No, Yes.

I don't recommend keeping the legacy IPTC data, but if you must, set the character set options for IPTC in Edit > Preferences > Metadata to Cyrillic to ensure that the keywords and other data is written with the correct character set by ExifTool.
But that often causes other issues and unless you must maintain metadata in legacy IPTC for some reasons, removing it and using only XMP is the much better option.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mosdubindi1

Thank you very much Mario,

Agree with you - for RAW and TIFF files XMP solution is the best for me, now I can proceed further. IMatch is the perfect tool for sure.

As for JPEG files - Cyrillic is working more or less OK even having IPTC data.

Mario

QuoteAs for JPEG files - Cyrillic is working more or less OK even having IPTC data.
This depends on whether or not the legacy IPTC data has a character encoding specified, if the encoding matches your Metadata > IPTC Character set encoding etc. Legacy IPTC was always a bit fuzzy about character set encodings, but that has been solved by XMP a long time ago.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook