photools.com Community

IMatch Discussion Boards => IMatch Scripting and Apps => Topic started by: ubacher on October 18, 2017, 02:52:38 PM

Title: Writing text with Umlaute in Json file - reads back wrong
Post by: ubacher on October 18, 2017, 02:52:38 PM
I have  used the sample app: Files to write a text file and then read it back. I modified the app to write
some German Umlaute at the end: "message" : "Hello from IMatch äöüß",

When I the read back this json file it does not show the Umlaute correctly. What's wrong?

I attach the demo.json file for others to try. ( remove the .txt from the name before using)

I have set: WIN 10 english, Region:Austria, Language English(UK)
Title: Re: Writing text with Umlaute in Json file - reads back wrong
Post by: Mario on October 18, 2017, 03:06:35 PM
This is what I see in my editor. All German umlauts are perfectly OK.
The umlauts are also OK in Windows Notepad.

(https://www.photools.com/community/index.php?action=dlattach;topic=7257.0;attach=16562;image)

You need to use an editor that handles UTF-8 encoding.
Which editor did you use?

Title: Re: Writing text with Umlaute in Json file - reads back wrong
Post by: ubacher on October 18, 2017, 04:54:33 PM
I think it might be a problem with the read, not the writing of the JSON file.
Output from files app:
(https://www.photools.com/community/index.php?action=dlattach;topic=7257.0;attach=16564)
Title: Re: Writing text with Umlaute in Json file - reads back wrong
Post by: Mario on October 18, 2017, 06:29:09 PM
Please file a bug report so I can look into this for a later release.
Title: Re: Writing text with Umlaute in Json file - reads back wrong
Post by: Mario on October 18, 2017, 07:16:00 PM
Save it. Fixed already for the next release.
The read file endpoint did not consider files to be UTF-8 when no BOM was included. But this is optional and the Unicode Standard permits the BOM in UTF-8,[3] but does not require or recommend its use (https://en.wikipedia.org/wiki/Byte_order_mark, http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf).
Title: Re: Writing text with Umlaute in Json file - reads back wrong
Post by: jeg on November 27, 2017, 10:14:05 PM
I have maybe a similar problem too. I use ghostscript to extract the text content of a pdf-file. In this file I can see the Umlaute like this
Für 1 R
but when use IMatch.readTextFile I get the following
F�r 1 R
as the result. Will be the problem solved with the next release?
Title: Re: Writing text with Umlaute in Json file - reads back wrong
Post by: Mario on November 28, 2017, 08:07:22 AM
You read a PDF file with readText in IMatch? This cannot work.

Or do you produce a text file from GS? If so, make sure it's in standard Windows UNICODE or when you need to produce an UTF-8 encoded file, make sure it has a BOM header.

readTextFile currently interprets files without a UTF-8 or UNICODE BOM at the beginning as standard Windows ANSI-encoded. The next version interprets files without a BOM as UTF-8-