Data Conversion: "German Umlaut" Issue

Started by jelvers, February 26, 2014, 12:45:42 PM

Previous topic - Next topic

jelvers

I recently reported problems with data conversion results, whereby "Image Info" for certain files had not been converted into IM5. After a thorough check and lots of conversion trials I finally discovered that all file with a 'German Umlaut' in their name or files in a directory or subdirectory, which had a name containing an "Umlaut", were not converted correctly. I guess that the Umlaut somehow prevents the conversion programm to convert Image Info data.

But there is a very simple go around: just changing the Umlaut 'ü' into 'ue', for example, made the conversion process going smooth as expected. I have changed all Umlaut names in my files/directories and data conversion rumns like a dream.

My advice for the time being: don't use 'Umlauts' in directory or file names.

Juergen

Mario

ExifTool cannot access files with names containing non ASCII characters (or folder names) if the 8.3 file name support has been disabled for the media containing the files. May this be the problem on your system?

jelvers

Quote from: Mario on February 26, 2014, 01:14:03 PM
ExifTool cannot access files with names containing non ASCII characters (or folder names) if the 8.3 file name support has been disabled for the media containing the files. May this be the problem on your system?

I think you have a valid point. My W8 registry entry shows:  NtfsDisable8dot3NameCreation = 2, which is according to my knowledge identical with the W7 entry of 1, which means, 8.3 support is disabled. I don't want to fiddle with the registry on my W 8.1. system, so I'll leave it as is. In the meantime I have changed the respective filenames, which wasn't a lot of work fortunately.

Anyway, IM5 runs like a dream now. It is a great piece of software  :) :) :) :) !!! I am waiting for the final release!

Juergen

Mario

A solution for this dreaded problem is still on my to-do list.
Phil says that he cannot handle it in ExifTool because the PERL runtime on which ExifTool is build on does not handle Unicode or UTF8 file names.  At least not without major changes in ExifTool.

IMatch is sending UTF8-encoded file names to ExifTool, which should be able to work around this. But Perl does not interpret UTF8 file names correctly. Which causes the strange situation that a user is able to do a

exiftool.exe -all:all c:\blümchen\tülpe.jpg

on the Windows command line, but reading or writing metadata from/to the same file fails in IMatch 5. This is because IMatch encodes the folder and file name in UTF8, which is handled by ExifTool and passed to Perl, which fails to recognize the file name...

IMatch has to send arguments to ExifTool in UTF8 in order to support non-ASCII languages and metadata values. I need to find a way somehow to send everything in UTF8, except the file names. Then IMatch will be able to support umlauts and other characters as long as the code page the user is using handles the characters as well.

I will tackle this somehow before the final release. Which means soon.


cytochrome

One solution could be to test for the  CodedCharacterSet Tag in new files at import. If not already set then convert to UTF-8. This is slow but done only once. Maybe a problem is that the tag is iptc.

It is what I do manually with the EP for old file folders already ingested in IM  and are not in UTF-8
-overwrite_original_in_place
-codedcharacterset=utf8
{Files}

Francis

Mario

It's only about the file and folder names when the user has disabled 8.3 file names.

IMatch and ExifTool have no problem with different metadata encodings, these are always handled correctly.

cytochrome

This is true now that I take care to set any program I use to ingest files to unicode/utf-8.

But if you have old folders dating back to times I had no idea of diacritic characters and their possible nuisance and if some files where encoded latin, latin2 or unicode (which is my case) and you have written metadata to iptc (which is my case) you may have bad surprises. Of course one can set the metadata panel to read iptc as latin or latin-2 but then when some files are utf-8 more surprises  :).

So I display {File.MD.IPTC::EnvelopeRecord\90\CodedCharacterSet\0} in my custom metadata panel and convert to utf-8 when needed.

Francis

sinus

Quote from: jelvers on February 26, 2014, 12:45:42 PM

My advice for the time being: don't use 'Umlauts' in directory or file names.

Juergen

It is sad and not to believe, but you are completely right. If we use Umlauts, some programs, DAMs, websites, smartphones, email-progs and so on shows them correct, some not.

Even today I can see a lot of problems, if I use Umlauts. Hence I did not use umlauts since several years. But sometimes clients insists to use them (ist doch kein Problem mehr heute), so I use them and (sometimes) the problems starts.

I do not use Umlauts, even it looks sometimes akward. I do also not use accents and so on (éèüöä) ... very sad, but it is so. Except you have a workflow, where all is correct and a file does not leave this "safe region".
Best wishes from Switzerland! :-)
Markus

cytochrome

All this is strange and sad indeed.

I use diacritic characters in folder names (and not only 8.3 names, like "février-2014") and never had problems. My photo names are normalized after the camera body, so no diacritics there. Problems are (mostly gone) with diacritics in description, headline and location. The worst was with metadata writen in Photomechanic and opened in Bibble/ASP. For a long time Photomechanic used windows-1252 perfect for the US. Now one can use UTF-8...

In IM it is simple now, if needed I convert to UTF-8 and read/write UTF-8, exiftool is happy and me too.

Francis

David_H

Quote from: jelvers on February 26, 2014, 02:45:08 PM
Quote from: Mario on February 26, 2014, 01:14:03 PM
ExifTool cannot access files with names containing non ASCII characters (or folder names) if the 8.3 file name support has been disabled for the media containing the files. May this be the problem on your system?

I think you have a valid point. My W8 registry entry shows:  NtfsDisable8dot3NameCreation = 2, which is according to my knowledge identical with the W7 entry of 1, which means, 8.3 support is disabled. I don't want to fiddle with the registry on my W 8.1. system, so I'll leave it as is. In the meantime I have changed the respective filenames, which wasn't a lot of work fortunately.

Anyway, IM5 runs like a dream now. It is a great piece of software  :) :) :) :) !!! I am waiting for the final release!

Juergen

If the registry shows a value of 2, that means its enabled (or disabled) or a per volume basis.

From an elevated command prompt, do fsutil 8dot3name query <drive>

You should get an output similar to :
The volume state is: 0 (8dot3 name creation is enabled).
The registry state is: 2 (Per volume setting - the default).

Based on the above two settings, 8dot3 name creation is enabled on c:

If you need to turn it on; fsutil 8dot3name set 0 <drive>

axel.hennig

Quote from: Mario on February 26, 2014, 07:40:55 PM
A solution for this dreaded problem is still on my to-do list.

Since I bought a new hdd-drive this comes up to my mind.

I enabled 8dot3 names, but is it still necessary (the help says yes).

Mario

ExiifTool has solved this a a while ago and IMatch uses the UTF8 file name option since then.
Where in the help did you find the reference?

There may be other 3rd party components, video and file format converters which IMatch uses that fail with non-ANSI file names. I don't test this explicitly, mind.
A standard rule for DAM is to use plain folder and file names only, without language-specific characters. This greatly reduces the risk of file-name related problems, especially if you work on different operating systems, upload your files to web services etc.

axel.hennig

Quote from: Mario on December 07, 2017, 01:20:41 PM
Where in the help did you find the reference?

I entered "8dot3" in the search. The result was just one hit with title "ExifTool". I think the page is tech_exiftool.htm.

At the bottom: Due to the inability of Perl (the programming language ExifTool is written in) to handle Unicode file names, ExifTool cannot access file names which contain non-ASCII characters. Or, to be more precise, file names with characters not included in the code page configured for the user running IMatch.

See also screenshot.

Mario

I have removed the corresponding info.
IMatch always uses the UTF-8 file name encoding since ET introduced that.