Converter with "kryptic signs"

Started by sinus, June 20, 2014, 02:05:52 PM

Previous topic - Next topic

sinus

Hello

I have created a fraction of my working DB in 3.6, for testing.

The database has no errors in 3.6 and has 9 folders with 3800 images, 6000 cats and 30 properties (total 178 MB).

I converted it with the converter, without cache and without use the "pending xmp"-option.

But after open the db in IM5, the metadata shows some curious signs, you can see them in the first pic.

It seemes to happen only to NEF-files, the jpg-files does not have this errors.

The second pic shows the same data in 3.6.

I tried this two times, always the same.

Do you have an idea, what could be the reason and what I could try to do? Because it happens only to NEF, maybe you have an idea.

Maybe I should use the "pending xmp" - option in 3.6, but because the images are real, I wait on some comments of you ;)
If I do use this command in 3.6 (pending), then all nefs would get a xmp-file, is this correct?

Idea: I want try the conversion first with 2-3 example-DB, after success I will try the conversion to the real DB (about 200'000 images, 6.5 GB).

ups, forgot the two files, here they are:








[attachment deleted by admin]
Best wishes from Switzerland! :-)
Markus

Ferdinand

I have seen this myself.  I think it has something to do with the character encoding for the metadata (i.e.  UTF-8 vs not UTF-8).

sinus

Quote from: Ferdinand on June 20, 2014, 05:35:24 PM
I have seen this myself.  I think it has something to do with the character encoding for the metadata (i.e.  UTF-8 vs not UTF-8).

Thanks, Ferdinand, I thought, I have looked at this, but -shame- I most say, not very exactly.
I will look at this.

I think, I will do some test for the conversion, before I do convert my real DB. Actually I work still with 3.6, simply because IM3.6 for me is rock-stable and I am quite sure, that nothing special does happen.

IM5 I do still know not enough good, but I guess, in about 2-6 weeks I will do this big step.

I am not sure, if you have done it, the real "migration".
Have a good time ... I will go now to the Television, because the team of my country has this evening a play against ... les Bleus ... France.
Best wishes from Switzerland! :-)
Markus

Ferdinand

Quote from: sinus on June 20, 2014, 05:41:30 PM
I am not sure, if you have done it, the real "migration".

Yes I have.  I saw the same symptoms for some of my images after the migration. I realised that the database I used for the migration was a little old, and had the wrong character encoding for metadata.  So I changed it to "default"  (I think this is the correct setting) and reloaded the metadata and the problem went away.  (This is my recollection and I'm not in a position to check this setting right now.)

(I also had this issue during the beta but never got to the cause.  I think this was before Mario fixed up the metadata character encoding issue.)

Mario

This looks like a problem with the character encoding in your IPTC data.

IMatch imports IPTC into XMP.
For that it must convert the IPTC data into the UTF-8 code page used by XMP.

ExifTool checks if the IPTC record has a character set encoding set. There are only "ASCII/ANSI" and "UTF8" supported by IPTC.

If the IPTC record is not marked as UTF-8, ExifTool assumes the local code page of the current user.
The conversion into UTF-8 is performed from the local code page into UTF-8.
If the IPTC record is marked as UTF-8, the data is copied "as-is".

If the IPTC data is marked as UTF-8 but is not encoded as such => problem.
If the IPTC is not marked as UTF-8 but does not match the local code page of the user => problem.
If the IPTC data contains a mix of UTF-8 data and local code page data => BIG problem.

1. If you run into problems importing IPTC into XMP, try setting the IPTC Character Encoding for READ under Edit > Preferences > Metadata to a different code page.
2. Select a problem file in file window.
3. <Shift>+<Ctrl>+<F5>
4. Reload Metadata
5. Check results in Metadata Panel.
6. Repeat until problem fixed.

(Note: IMatch 3 always wrote UTF-8 metadata since about 2008 or so).

Problem with write-back:

Files with such character set problems can cause problem during write-back. IMatch does never rewrite the entire IPTC record, it only updates modified fields. See the help on Edit > Preferences > Metadata for detailed info.

sinus

Thanks, Ferdinand and Mario

I must make more tests.
In IM3.6 I had the iptc-code-page set on default and not write in UFT-8.

In IM5 I had set to read and write to UFT-8.
With these preferences I had the cryptic signs, like I described in my first post.

Now I changed language in IM3.6 to UFT-8, also write in UFT8 (see the pic).

Now the cryptic signs are gone, hurrah, but ... all the Umlauts are still wrong.

I must make more tests, but have not the time yet. I have no troubles to alter the code-page in IM3.6 and alter all images. But I must try, is it better to change something in IM3.6 or in IM5.

If you have still an idea, where I should begin with more tests, please don't hesitate to write it here ;)

Otherwise I will try something.






[attachment deleted by admin]
Best wishes from Switzerland! :-)
Markus

Mario

The "Default" setting in IMatch 3 means UTF-8.
Please send me a file with shows up with cryptic characters (one of the original files where you saw the problem). I need to run some tests and check if the IPTC data is UTF-8 or not.

sinus

Quote from: Mario on June 22, 2014, 01:21:48 PM
The "Default" setting in IMatch 3 means UTF-8.
Please send me a file with shows up with cryptic characters (one of the original files where you saw the problem). I need to run some tests and check if the IPTC data is UTF-8 or not.

Thanks a lot, Mario, have sent you a NEF-file, just now.
Markus
Best wishes from Switzerland! :-)
Markus

Mario

Hi, Markus

The NEF file contains only IPTC metadata, no XMP.
I have imported the image with the default settings. No problems at all.
There is no CodedCharacterSetTag in the IPTC record, which means that it is encoded in a local code page and must be read a such.

In your screen shot above you have changed the character set from Default to UTF8. This means that IMatch is forced to assume that the IPTC data in the file is UTF8, and that is the reason for your problem.

If you leave it to default, the built-in logic is as follows: If the IPTC data is marked as UTF8, it is processed as such. If there is no UTF8 mark, the data is processed using the local code page of the user (you).

Example:

When I force IMatch to read the data as UTF8, I get

[IPTC]          Headline                        : l�ngere Headline

when I use the default settings I get:

[IPTC]          Headline                        : längere Headline


Solution:

Switch the IPTC Character Encoding settings under Edit > Preferences > Metadata back to Default.

Select the file, <Shift>+<Ctrl>+<F5> : Reload Metadata



[attachment deleted by admin]

sinus

Cool, Mario, what a super support!

Though I must say: a super support as usual!  :)

Thanks a lot, I will try this a soon as possible!
Best wishes from Switzerland! :-)
Markus

sinus

Ahhh!!! Super, thanks, Mario,

after your support my metadata, converted with your db-converter, from 3.6 to IM 5, are now perfect!!!

I changed in IM 5 to default, read and write, and all is ok now.


ONE LAST QUESTION:

Would it make sense, to set in IM 5, leave reading as "default", but set WRITE to UFT-8?

I ask this, because often is UFT-8 recommended, I believe you did also.
Best wishes from Switzerland! :-)
Markus

Mario

QuoteWould it make sense, to set in IM 5, leave reading as "default", but set WRITE to UFT-8?

NO.

When IMatch updates IPTC data in one of your files, it never writes the entire IPTC record from scratch. It only updates the modified tags (e.g. keywords) or adds the tags if they are missing. It is very important that the existing character encoding is not changed by this operation. If you force IMatch to write UTF8 but the existing IPTC data in the file is not already UTF8, you create a mix of different character set encodings within one IPTC record. And this will not only damage the record, but you will not be able to repair that.

In the default setting, when ExifTool creates a new IPTC record (when adding IPTC data to a file which has none) it will create an UTF8 encoded block automatically.

Also: IPTC is dead-dead-dead. Don't use it. Don't add it to new files. Use XMP instead. Problem solved. XMP is always UTF8.

It's best to leave these settings to default. I have only added them so users can, for very special cases and temporarily, override the default processing. See the help for more info.

sinus

Quote from: Mario on June 23, 2014, 06:47:00 PM
QuoteWould it make sense, to set in IM 5, leave reading as "default", but set WRITE to UFT-8?

NO.

When IMatch updates IPTC data in one of your files, it never writes the entire IPTC record from scratch. It only updates the modified tags (e.g. keywords) or adds the tags if they are missing. It is very important that the existing character encoding is not changed by this operation. If you force IMatch to write UTF8 but the existing IPTC data in the file is not already UTF8, you create a mix of different character set encodings within one IPTC record. And this will not only damage the record, but you will not be able to repair that.

In the default setting, when ExifTool creates a new IPTC record (when adding IPTC data to a file which has none) it will create an UTF8 encoded block automatically.

Also: IPTC is dead-dead-dead. Don't use it. Don't add it to new files. Use XMP instead. Problem solved. XMP is always UTF8.

It's best to leave these settings to default. I have only added them so users can, for very special cases and temporarily, override the default processing. See the help for more info.

Everything is clear now for me, thanks a lot - for your support after about 6 minutes! - thumbs up -
Best wishes from Switzerland! :-)
Markus