Hierarchical Keywords Processing

Started by jch2103, January 29, 2022, 12:31:33 AM

Previous topic - Next topic

jch2103

I've run into an issue with hierarchical keywords. I've had some issues with processing of such keywords in output files from PhotoLab 5, but I'm not sure this is in fact a PL5 issue. Attached are zipped copies of the XMP file from the original NEF file and a JPG output file from PL5 prior to when IMatch performed a write on the file.

The XMP contains
<rdf:Description rdf:about=''
  xmlns:lr='http://ns.adobe.com/lightroom/1.0/'>
  <lr:hierarchicalSubject>
   <rdf:Bag>
    <rdf:li>animals|bird|kestrel|american kestrel</rdf:li>
   </rdf:Bag>
  </lr:hierarchicalSubject>
</rdf:Description>

and
<dc:subject>
   <rdf:Bag>
    <rdf:li>animals</rdf:li>
    <rdf:li>bird</rdf:li>
    <rdf:li>kestrel</rdf:li>
    <rdf:li>american kestrel</rdf:li>
   </rdf:Bag>
  </dc:subject>


This seems straightforward. The JPG output file (via ECP) contains
[XMP]           Subject                         : american kestrel, animals, bird, kestrel
[XMP]           Hierarchical Subject            : animals|bird|kestrel|american kestrel

which seems consistent.

However, if I view the JPG metadata in the JPG in an IMatch metadata panel, I see in hierarchical keywords
animals|bird|kestrel|american kestrel; animals; bird; kestrel
## Note the extra components! See attached screenshot.

IM shows the JPG has a 'write' pending to XMP::Lighroom\HierarchicalSubject.

1. I understand why IM shows a pending write; IM wants to make sure that hierarchical keywords are reconciled to the 'flat' dc:subject keywords.
2 But why is the IM UI showing the duplicated parts of the hierarchical keywords as shown above? They aren't in hierarchical keywords in the XMP file from the original NEF, and they don't show up in the ExitTool dump above. (A rescan changes nothing and the pencil remains.)
3. When I trigger the file write, the duplicated hierarchical keywords still display in the IM metadata panel, but now the ExifTool dump of the JPG shows the 'extended' hierarchical keywords!!
4. I've attached the zipped ECP output for the JPG before and after the IMatch write to XMP::Lighroom\HierarchicalSubject. I've also attached a copy of the IM log. I was going to attach output from the Metadata Analyst (post-write), but it only shows the usual trivial warnings.

Questions:
- Why, before I trigger the 'write', is IM displaying duplicated parts of the hierarchical keyword in a new JPG output file from PhotoLab when an ECP dump of the JPG doesn't? (They also don't show up if I use a different ExifTool tool.) Where did these come from? They apparently weren't in the JPG file anywhere that ExifTool shows.
- Why are these duplicated parts of the hierarchical keyword then added to the metadata for the JPG after the pending write?

I'm confused, but perhaps I'm missing something. This behavior is fairly recent, within the last month or two, I think. Is there other information I can provide to help resolve and understand these questions?
John

jch2103

ps - There are no Legacy IPTC tags in either of the files.
John

JohnZeman

Occasionally I'll see this happening too but for me it's not just to JPG versions.

The raw masters will also have extra keywords (@Keyword Categories) assigned to them in the same manner as you're seeing John.

For me it's only been happening to new images I add to the database but that probably doesn't mean much since I rarely re-categorize existing images in my database.

Mario

This can happen if you let your other software break up hierarchical keywords into a bunch of individual path segments.
If your Keyword Import Settings are wrong or your thesaurus is not set up correctly so that IMatch can find each individual keyword in your files and map it to a hierarchical keyword, IMatch will fall back to import the flat keywords "as-is". There is only so much IMatch can do in this case.

jch2103

But my problem is that, as far as I can tell (based on the metadata in the files), the other software has not broken up the hierarchical keywords into segments. That's why I posted this.
John

Mario

But you show that ExifTool shows these flat keywords:

animals
bird
kestrel
american kestrel


and this hierarchical keyword:

animals|bird|kestrel|american kestrel

When IMatch imports the file, it imports both the flat and hierarchical keywords and consolidates.
If it cannot map the flat keywords via the thesaurus, they end up as individual keywords.

DigPeter

#6
Is this problem related to that discussed in https://www.photools.com/community/index.php?topic=12150.msg86287#msg86287 , to which I have not found a solution? 

jch2103

Quote from: DigPeter on January 29, 2022, 06:45:18 PM
Is this problem related to that discussed in https://www.photools.com/community/index.php?topic=12150.msg86287#msg86287 , to which I have not found a solution?

It may be, but that's one of the things I'm trying to figure out...
John

jch2103

#8
Quote from: Mario on January 29, 2022, 05:36:04 PM
But you show that ExifTool shows these flat keywords:
animals
bird
kestrel
american kestrel

and this hierarchical keyword:
animals|bird|kestrel|american kestrel

When IMatch imports the file, it imports both the flat and hierarchical keywords and consolidates.
If it cannot map the flat keywords via the thesaurus, they end up as individual keywords.

1. I entered the hierarchical keywords in IMatch for the NEF using the XMP::Lighroom\HierarchicalSubject tag; all 'flat' keywords were produced by IMatch.

2. Note that the exported JPG included only the hierarchical keyword w/o additional keyword levels before I clicked the 'pencil' for IM to perform a write.

3. In fact, my thesaurus Manager includes the entire hierarchical sequence above. So IMatch should have mapped the hierarchical sequence and not added duplications.
[2 - "What"]
...
[Nouns]
animals
{animal}
...
bird
...
kestrel
american kestrel


4. I've routinely added hierarchical keywords via a metadata panel that didn't exist in my thesaurus, but without this problem. My Metadata Settings match the default shown in the Help. See screenshot.

5. See JohnZeman's comment above. I've also seen this behavior although I haven't documented it.

So there seems to be a problem somewhere in IM's hierarchical keyword handling. Has IM keyword handling changed? Or perhaps something in ExifTool? Is something else going on?

John

JohnZeman

I believe I have solved this problem, at least it appears I have solved my version of the problem.

And it is indeed caused by PhotoLab 5.

As a test I went outside this afternoon and took 5 CR3 photos and assigned the

Things|Our Things|Garden Shed

Keyword to them.  Verified the keywords and other metadata looked good in IMatch then I sent those CR3 images to PhotoLab 5 for processing.

I didn't touch or even look at the keywords or any metadata in PhotoLab, I just optimized the images and exported them as 16 bit TIFs back to IMatch.

After IMatch imported the PhotoLab processed TIFs the keywords now show

Things
Our Things
Things|Our Things|Garden Shed

The Solution

Starting all over again with the same CR3 images I deleted those 5 TIFs from IMatch and went back to PhotoLab and exported those 5 images again as 16 bit TIFs but this time I unchecked the option to export the keywords with the TIFs (see attached screenshot).

After exporting the TIFs to IMatch a second time I used IMatch to verify that no keywords were in any of the 5 TIF images.

So far so good.

Then I sent those 5 exported TIFs without keywords to Affinity Photo and did my final processing, exporting the optimized images as high resolution JPGs to IMatch.

IMatch imported the JPGs and because I use versions IMatch propagated the metadata including the keywords from the original master CR3 images to the JPGs.

So in the end the only keyword in those 5 JPGs was the correct

Things|Our Things|Garden Shed

All looks good and everyone lived happily ever after.  ;D

jch2103

Quote from: JohnZeman on January 29, 2022, 10:59:45 PM
I believe I have solved this problem, at least it appears I have solved my version of the problem.

And it is indeed caused by PhotoLab 5.
That was the conclusion I had also come to, and there's no question that IM propagation should handle the situation.

But I'm still confused, because in my recent test (described above) the PL5 export JPG initially seemed to have only the single line hierarchical keyword w/o additions/duplications, according to my ECP metadata dump. It wasn't until IM wrote the pending hierarchical keywords that the 'extra' ones showed up. What I don't understand is where the 'extra' parts came from. Were they hiding somewhere in the JPG metadata? Or did something else happen? I'd like to figure this out, but I'm not sure how...



[BTW, DxO has fixed one metadata bug that I'd reported to them: Legacy IPTC metadata is no longer being written to export files, at least when it doesn't exist in the raw/xmp files. I do still think that DxO should implement an option to pass metadata through unchanged, as it did w/ PL4.]
John

JohnZeman

#11
John apparently I spoke too soon last time.  :-[

The solution I previously posted does solve the PhotoLab invalid keywords issue but like you, I'm discovering that IMatch is also doing to the JPGs just what you're saying.

Using my last example if I take my 5 CR3 raw images and delete all of the keywords from them and do a writeback, when I check the CR3 masters and JPG versions there are no keywords assigned to any of the images which is good.

But then if I select the 5 CR3 masters and assign the

Things|Our Things|Garden Shed

Keyword to them and do a writeback that keyword is written to the 5 CR3 master images as it should be.
However sometimes the JPG versions for those masters will end up with the following keywords.

Things|Our Things|Garden Shed
Our Things
Things

But then if I then select the 5 CR3 masters and do an F4,P to manually propagate metadata from masters to versions the problem goes away, the JPGs will then only have the correct

Things|Our Things|Garden Shed

Keyword assigned.

I've just repeated this test 5 times and 3 of those 5 tests resulted in invalid keywords being added to the JPGs until I did an F4,P to correct the problem.

This is so strange, it appears there are 2 ways this invalid keyword problem can happen.  With invalid keywords being generated by PhotoLab and by IMatch when it writes back metadata.

Edit: I just did 5 more tests with 5 new images and the problem occurred every time, now the problem is easy to reproduce.  But like before F4,P corrects the problem.

jch2103

Another test:

I used the NEF/xmp used in the initial example, and processed in PL5 with the Include None (i.e., no metadata export) option. No keywords are present in the output JPG according to IMatch; ECP List Metadata shows no XMP keywords or other keywords. The NEF/xmp contains (according to IMatch) only Hierarchical Keyword `animals|bird|kestrel|american kestrel`.

I then did a Clipboard Copy from the NEF/xmp and a Clipboard Paste Attributes and Data (XMP Data only) to the JPG. Result: the JPG Hierarchical Keyword now contains `animals|bird|kestrel|american kestrel; animals; bird; kestrel`.

Where did those extra parts come from? (As noted above, my Thesaurus includes an entry for this keyword.)


I then ran the ECP 'Delete all Metadata' (-overwrite_original_in_place -all= {Files}). It worked (of course).
I then repeated the above Clipboard Paste Attributes and Data (XMP Data only). Result: the JPG Hierarchical Keyword now contains only `animals|bird|kestrel|american kestrel`. (In both cases, dc:subject contains `american kestrel; animals; bird; kestrel`.)
But then I repeated ran this test again, including the paste XMP Data only. This time the JPG Hierarchical Keyword contains `animals|bird|kestrel|american kestrel; animals; bird; kestrel`!!

I'm confused!!





John

DigPeter

I am experiencing similar behaviour.  Unfortunately I still cannot get version propagation to work.

Mario

#14
I'm not sure I've followed this entire thread correctly.
I tried to reproduce the results of the initial post.

1. I've added the keyword animals|bird|kestrel|american kestrel to a file, and used the command to add it also to the thesaurus.

2. I changed the default settings under Edit > Preferences > Metadata to
Write hierarchical keywords: on
Write path elements: on
This causes IMatch to split the hierarchical keyword into segments and to write each segment.

3. Write back. The resulting output file now contains:

[XMP-dc]        Subject                         : animals, bird, kestrel, american kestrel
[XMP-lr]        Hierarchical Subject            : animals|bird|kestrel|american kestrel


which is correct.

4. Now I remove the file from the database to rule out any possible side-effects of protection and the "don't replace existing hierarchical keywords".

5. I make sure that the option Keyword lookup via thesaurus is on (Default).

6. I add the file again to the database. IMatch reads the flat keywords in dc:subject and the hierarchical keywords in lr:hierarchcialSubject.
This produces this list of keywords to import:

animals
animals|bird
animals|bird|kestrel
animals|bird|kestrel|american kestrel


Note that the flat keyword american kestrel is not imported because it matches the leaf (bottom) level of an existing hierarchical keyword in the thesaurus.
The same happens for the other flat keywords, they are mapped to the existing hierarchical keywords based on the leaf level:

animals => animals
bird => animals|bird
kestrel => animals|bird|kestrel
american kestrel => animals|bird|kestrel|american kestrel


These keywords are then imported into the database and mapped backed to the flat XMP keywords based on the Edit > Preferences > Metadata settings.
The resulting list of keywords in IMatch looks like this, which I think is correct:



There is no way for IMatch to somehow convert a list of unrelated and unlinked flat keywords like animals, bird, kestrel, american kestrel and fold them into one hierarchical keyword. Any keyword may appear multiple times at different levels in the thesaurus, producing many different hierarchical keywords...

JohnZeman

I guess my question boils down to, with the Edit > Preferences > Metadata Options set as you say, and that's the way I have them set, everything works as it should when assigning keywords to the master image.

Hierarchal keywords are written to the

{File.MD.XMP::Lightroom\hierarchicalSubject\HierarchicalSubject\0}

Tag and leaf keywords are written to the

{File.MD.XMP::dc\subject\Subject\0}

Tag.

So far so good.

But when I do a writeback the leaf keywords are also added to the

{File.MD.XMP::Lightroom\hierarchicalSubject\HierarchicalSubject\0}

tag in addition to the hierarchal keywords but only in the JPG version.

That is until I do an F4,P to propagate metadata, then the leaf keywords are removed from the

{File.MD.XMP::Lightroom\hierarchicalSubject\HierarchicalSubject\0}

Tag of the JPG version.

Mario

So propagation is what causes this? The OP wrote about trouble with keywords written by PhotoLab 5? That's the post I've referred to in my experiment.
Do you propagate XMP data or only keywords?

JohnZeman

Quote from: Mario on January 30, 2022, 06:01:21 PM
Do you propagate XMP data or only keywords?

I propagate all XMP data except rotation.  Uh oh....maybe that's where my problem is?  :-[

Mario


JohnZeman

Thinking about it more, why would propagating all XMP data from master to version propagate leaf keywords to the Hierarchal Keywords tag of the version when the master has no leaf keywords assigned to the Hierarchal Keywords tag?

Mario

Propagation is performed using ExifTool.
IMatch first writes-back the master and then copies (via ExifTool) the tag groups you have requested from the master to all versions.
After that, both the master and version are re-imported to update the database with the current file contents. Which also involves keyword import and mapping via the thesaurus, applying the options you have configured for the things I mention in my post above.

Digging deeper into this would require all your metadata settings, which data you propagate, they keywords you add to the master, your thesaurus etc. And a lot of my time.
Please open a bug report so we have all this info in one place, and don't mix propagation issues with issues caused by another application when writing keywords.

JohnZeman

Ok I will do some more testing to see if I can narrow this down better.

The reason I added my input to this report is because the end result seems to be the same whether the problem is caused by PhotoLab improperly exporting hierarchal keywords or something in IMatch.

One way or another it seems leaf keywords are sometimes getting added to the Hierarchal Keywords tag of the version.

Thanks Mario.

Mario

QuoteOne way or another it seems leaf keywords are sometimes getting added to the Hierarchal Keywords tag of the version.

Yes. This is what I've explained above in my post. When it happens, why it happens, and why it cannot be handled differently.
This also depends on how you let IMatch flatten keywords, your thesaurus etc.

jch2103

Mario - Thanks for the extended explanation. The benefit of IMatch is its flexibility in handling almost any kind of metadata issue; the drawback is its flexibility in handling almost any kind of metadata issue...

After looking at my tests, those of other posters and your notes, I think I've concluded that what I had originally thought was an issue with DxO PhotoLab exports isn't.

However, the issue I see (the addition of leaf keywords in hierarchical keywords in output JPG files) didn't seem to occur before PhotoLab was updated to version 5. But the issue doesn't seem to occur consistently after that, either.

Like JohnZeman, I need to do more testing.


@DigPeter -
Quote from: DigPeter on January 30, 2022, 01:23:03 PM
I am experiencing similar behaviour.  Unfortunately I still cannot get version propagation to work.
Perhaps this is worth a separate post? I recently figured out why my Versioning link expression wasn't always working (incorrect regular expression).
John

Mario

I recommend not to split hierarchical keywords into individual segments during write-back. Keep them whole, which solves a lot of problems.
Edit > Preferences > Metadata: Write path elements set to off.

JohnZeman

Quote from: Mario on January 30, 2022, 10:10:59 PM
I recommend not to split hierarchical keywords into individual segments during write-back. Keep them whole, which solves a lot of problems.
Edit > Preferences > Metadata: Write path elements set to off.

Mario doing that does appear to have solved my problem.  Hopefully John (jch2103) will find a solution to his issue too.

Tveloso

I found that I have quite a few files with this issue also, and in all cases that I checked, the "out-of-sorts" keywords were in Version files only...but not actually in the files, but as pending write-back values for them.  When I looked at the keywords that were actually in the files via the ECP, they were identical between the Master and Version.

So the keywords were written correctly to the Master, and the Version, during the initial write-back of the Master (they are identical on disk), but then IMatch arrived at different keywords for the Version only, and set them as pending write-back.

JohnZeman and jch2103, do you find this to be true for you as well?...(that the keywords are identical on disk for the Master and Version, and only IMatch shows the incorrect ones at the Version)?

Quote from: Mario on January 30, 2022, 06:01:21 PM
So propagation is what causes this?
It does seem that propagation is causing this.  In my case there were no other applications involved, and the Keywords were entered in, and written by, IMatch only.  Only Version files (which receive their keywords via propagation), have the issue.

When I clicked the pencil in the Keyword Panel's entry field, to cause a pending write-back for one of the Masters (and I performed that write-back) this then fixed the issue on its Version (but that second write-back didn't actually change "anything" on disk - the keywords were always correct there - it just fixed the incorrect Keywords for the Version in IMatch).  So during the second write-back, IMatch didn't arrive at a keyword change for the Version file as it did during the first one.

I also found Master/Version pairs that didn't have this issue, yet had the exact same keywords assigned to the master.  And I'm pretty sure that the files without the problem were originally written-back in the same batch as the files with the problem, so as John and John have both pointed out, this is not happening consistently...(so this may be a tough one to identify).

I'll try to do some more testing as well...
--Tony

jch2103

Quote from: Tveloso on February 01, 2022, 05:07:46 AM
I found that I have quite a few files with this issue also, and in all cases that I checked, the "out-of-sorts" keywords were in Version files only...but not actually in the files, but as pending write-back values for them.  When I looked at the keywords that were actually in the files via the ECP, they were identical between the Master and Version.

So the keywords were written correctly to the Master, and the Version, during the initial write-back of the Master (they are identical on disk), but then IMatch arrived at different keywords for the Version only, and set them as pending write-back.

JohnZeman and jch2103, do you find this to be true for you as well?...(that the keywords are identical on disk for the Master and Version, and only IMatch shows the incorrect ones at the Version)?

Yes, that's what I'm seeing. I'm still trying to sort all this out, including how propagation variations affect this. In my particular case, the JPG files are being created by DxO PhotoLab, but they appear (based on ExifTool) to contain 'correct' metadata (specifically the same contents for dc:subject and lr:hierarchicalSubject as was in the orginal NEF file). It's when IM does a write-back that things get 'out-of-sorts'.
John

JohnZeman

Quote from: Tveloso on February 01, 2022, 05:07:46 AM
JohnZeman and jch2103, do you find this to be true for you as well?...(that the keywords are identical on disk for the Master and Version, and only IMatch shows the incorrect ones at the Version)?

I did see the same thing but not any more since I followed Mario's suggestion up above to disable the Edit > Preferences > Metadata > Write path elements option.

My problem has cleared now.
Of course this also means when I propagate metadata I no longer have leaf keywords in the {File.MD.XMP::dc\subject\Subject\0} tag but I don't care about that.

jch2103

#29
Mario - Can you expand here a bit on what's covered in the Help in Write path elements?

1. Does this refer to both dc:subject and lr:hierarchical Keywords? In any event, I'm finding that whether Metadata/Keyword Export/write path elements is set to either ON or OFF, IM is writing leaf elements to lr:hierarchical Keywords only for my output JPG files. For this particular test, I set my Preferences link expression for NEF and NEF versioning such that buddy files and versioning should NOT be functioning. 

I don't recall creation of 'extra' leaf keywords in lr:hierarchical Keywords happening in the past (> ~ 6 months ago). Did something in IM/ExifTool change? It there another setting that's causing this?


2. The Help also discusses Keyword Import. It says that IM automatically imports lr:hierarchical Keywords (good). For this test, I have 'write path elements' OFF and Keyword lookup disabled. If I already have a hierarchical keyword in lr:hierarchical Keywords, thesaurus lookup shouldn't be necessary, as there's no reason to create hierarchical keywords from the dc:subject tag. In this test, IM writes 'extra' leaf keywords in lr:hierarchical Keywords just in the JPG file. Why?

However, if I turn thesaurus lookup On (with the thesaurus containing the hierarchical keyword used in this test), IM again writes 'extra' leaf keywords in lr:hierarchical Keywords in just the JPG file. Again, why?

JohnZeman is apparently more successful:
QuoteOf course this also means when I propagate metadata I no longer have leaf keywords in the {File.MD.XMP::dc\subject\Subject\0} tag but I don't care about that.
I don't care about hierarchical keywords in dc:subject either, I think.
John

Tveloso

This is just a little bit off the subject, but when I was initially reviewing this in my IMatch Database, after having first read this topic (but without having it to refer to), and finding that I also had version files with this issue, I thought that I'd try what I remembered JohnZeman had mentioned fixed the issue for him - a forced propagation.  I couldn't remember the Keyboard shortcut he cited (F4-something), so I looked at the context menu for a propagation command, and found that I had an F4,T that was greyed out (and completely missed the F4,P just above it - which is what John had actually said):

   

So I proceeded to force the Master to be in need of a Write-back (which it wasn't before - only the version was), and then that Write-back (and the new propagation that happened with it) fixed the version file.  I was worried that it would be difficult to do this for all my Master/Version pairs that had the issue, until I re-read the post, and saw that in fact F4,P was available, and that did in fact correct the issue for me as well (I just filtered for Masters only in the Folder where the issue was present, and F4,P triggered the pending writeback for the Versions, and brought their keywords back in synch with the Masters).

So what is the difference between those two commands?...(F4,P - Propagate data to Versions and F4,T - Propagate)

But back to the matter at hand...this does appear to be a new issue (although I only recently started using versioning and propagation, so can't be sure). 

Quote from: JohnZeman on February 01, 2022, 10:56:27 PM
I did see the same thing but not any more since I followed Mario's suggestion up above to disable the Edit > Preferences > Metadata > Write path elements option.

My problem has cleared now.
Of course this also means when I propagate metadata I no longer have leaf keywords in the {File.MD.XMP::dc\subject\Subject\0} tag but I don't care about that.
But I actually would like to have (most of) the path elements from lr\hierarchicalSubject as separate keywords in dc\subject...and again, it all behaves as expected in the Master file - only the version winds up with the issue.  Hopefully this does turn out to be a bug that Mario will fix for an upcoming release...but in the meantime, F4,P is a good work-around.
--Tony

thrinn

Quote from: Tveloso on February 02, 2022, 03:54:41 AM
So what is the difference between those two commands?...(F4,P - Propagate data to Versions and F4,T - Propagate)
If I remember correctly, F4,P is only available if the focused file is as master. It triggers the propagation from the master to its versions.
On the other hand, F4,T is only available if the focused file is as version (that's why it is in the group of version related commands in the context menu).  I think it does the same as F4,P, only from the view point of a version. I am not sure what would happen exactly when a version has more than one master (never a good idea).
Thorsten
Win 10 / 64, IMatch 2018, IMA

Tveloso

Thank you Thorsten.

So this gives us a little better work-around...we can filter for the Versions instead, and in addition, filter for files with Pending writeback (these will be the files with the issue), and then trigger the propagation with F4,T.

Thanks again.
--Tony

DigPeter

Quote@DigPeter -
Quote from: DigPeter on January 30, 2022, 01:23:03 PM
I am experiencing similar behaviour.  Unfortunately I still cannot get version propagation to work.
Perhaps this is worth a separate post? I recently figured out why my Versioning link expression wasn't always working (incorrect regular expression).

@jch2103
Thanks John - This was the unsolved topic in https://www.photools.com/community/index.php?topic=12211.msg86639#msg86639