Hierarical kewords and "autoclassification"

Started by maxbelloni, November 13, 2023, 08:56:51 AM

Previous topic - Next topic

maxbelloni

Hi,

A simple (and probably silly, due to my still fresh experience with the current Imatch version update) question:

A file with hierarical categorization contains keyword like aaa|bbb|ccc

If you try to upload an image contaning this keywords in some other database which provide automatic classification based on keyword contents (i.e. stock photography) the keyword format aaa|bbb|ccc prevents such "autoclassification".

A possible solution could be having the keywords stored separately in the files, but kept hierarically only in Imatch database (maybe this is already possible, but I don't know how)

How do you deal with this issue?

(again, sorry for the possible silly question )

Max

Mario

IMatch internally only works with hierarchical keywords in the XMP::lr namespace.
During write back these keywords are "flattened" into the (if existing) legacy IPTC keyword tag and in the XMP:dc-Subject tag.
How keywords are flattened depends on your settings under Edit > Preferences > Metadata.
IMatch can flatten each level, only the leaf level etc. See Write hierarchical keywords for more information. This section explains how this works and the potential pitfalls you may run into.

Breaking up a hierarchical keyword like "location|beach|daytona" into 3 separate keywords "location", "beach" and "daytona" makes it impossible for IMatch to map the 3 keywords back to one keyword when IMatch ever has to re-import the file. If your thesaurus is up-to-date and has the hierarchical keyword:

location
  |- beach
    |- daytona

IMatch will map/assign the to each level of keyword. It cannot "re-combine" the 3 flat keywords it finds in the file to their hierarchical origin. If you let IMatch only write the leaf keyword "daytona", it will work just fine on re-import.
That's why the Don't replace existing hierarchical keywords option is on by default.

Which keyword tag does your stock photo web site read and why would it fail when it contains a hierarchy?

maxbelloni

Hi Mario,

Quote from: Mario on November 13, 2023, 09:10:11 AMWhich keyword tag does your stock photo web site read and why would it fail when it contains a hierarchy?

Basically the major stock agencies work in this way.

Here below you can find an example done on purpose (DeepMeta program for Getty/Istockphoto), where you can fint that the hierarical geographical keywords are treated as a single keyword, making them incomprehensible to the system.

I can understand your approach and the difficulty to deal with the problem.

Maybe adding an "export format" of the pictures with the broken (separated) keywords, so to have them ready for this use and not disrupt the IMatch database?


Mario

#3
Do they import XMP keywords (XMP-dc:Subject) or legacy IPTC keywords or both?

If they use legacy IPTC keywords, and you use the IMatch Batch Processor to produce your outputs, you can use the Metadata options to copy keywords to the output file using the Custom Metadata option.

For example, this variable:

{File.MD.hierarchicalkeywords|foreach:-keywords={value|splitlist:|,last}#;replace:#=={cr}{lf}}
copies the leaf level of each hierarchical keyword into the IPTC keywords tag.

The hierarchical keywords "location|beach|daytona" and "motive|portrait" are written as "daytona" and "portrait" to the output file. This assumes that you don't use one of the other metadata options to copy keywords already.

The variable splits each hierarchical keyword at the | character and then uses last to get the last part.
It combines this with the -keyword command line argument which adds keywords to a file in ExifTool and also adds a # placeholder. The placeholder # is then replaced with carriage-return / linefeed to bring each -keyword=bla into it's own line in the arguments file the Batch Processor produces for ExifTool, e.g.

-keywords=daytona
-keywords=portrait

and this produces the required result of all leaf level keywords without hierarchy.
Since the Batch Processor produces a new file, it won't interfere with your database contents at all.

If you don't want to use the BP, you can get the same result via Edit > Preferences > Metadata.


Image1.jpg

maxbelloni

#4
I've copied the same image in a directory without any sidecar file. The keywords are embedded in the file itself, as well as the upload phase does not mention any XMP sidecar file, so I believe they just look at the "old style" embedded IPTC keywords (and, of course, I'll do as you suggest, as always with a exhaustive answer 👍). Thanks!

maxbelloni

I forgot: some times also the upper level hierarchy keyword are useful. Is It possibile to save ALL the keywords (still as single keywords, of course)?

Mario

You can repeat the variable any number of times, replacing "last" with 0,1,2,3 .... to get each keyword on each level.
Never tried that, though. Use VarToy to test the result for some of your files.

Why don't you let IMatch just flatten your keywords and write each level. This would automatically produce what your service requires. When you enable both "Write hierarchical keywords" and "Write path elements" and you write back after changing keywords, XMP-dc:Subject contains the keywords: Portrait, location, beach, daytona
Getty should pick up the keywords just fine when you upload the file afterwards.

This will only create an issue when you remove the file from the database and add it again at a later time.
Then IMatch would find and import the flat keywords, without a way to map them back to their origin hierarchical keywords.

As long as the "don't replace hierarchical keywords" option is enabled and the file remains in the database, IMatch ignores the flat keywords as long as it has hierarchical keywords.

Mario

I have added a new variable formatting function that makes this step a lot easier (and many other things I can think of).

Example: for a file with the hierarchical keywords location|beach|daytona;motive|portrait, the variable

splicelist:~;|,-keywords=,{cr}{lf}

returns
-keyword=location
-keyword=beach
-keyword=daytona
-keyword=motive
-keyword=portrait



maxbelloni

Probably it's anerror by mine, but with 2023.4.4 looks like that the batch processor does not copy anymore any metadata in the new files.

Both using
{File.MD.hierarchicalkeywords|foreach:-keywords={value|splitlist:|,last}#;replace:#=={cr}{lf}}

or
splicelist:~;|,-keywords=,{cr}{lf}
or selecting any metadata to be copied in the new file, in the new file all the metadata are missing.

Am I doing something wrong? (probably...)

Mario

F-word. Stupid me. Forgot or accidentally erased one exclamation mark

This problem is caused by an improvement I made for a report that the Batch Processor appears to be not making progress anymore while metadata is written or erased for larger batches of files. A new message is displayed in the BP UI and "user abort" checks are made more often. And the "user abort" check in the "write metadata" code portion thinks the user has aborted the writing of metadata and exists. I will include a fix for this in the next release.

maxbelloni

Any new version may introduce bugs! Things which happens to all and always.

As always, thanks for the superfast answer (and solution).

Max

maxbelloni

Hi Mario,

The V2023.4.6 fixe the metadata writing issue. Very good!

Now my question is how to integrate the two instuctions you suggested in order to obtain all the keyword (coming form hierarical structure) separated in the IPTC file data (that would help me a lot in understanding better the custom metadata language).

Thanks!

Max

Mario

#12
I answered your email about the same question yesterday already, with an image of where to put the variable in the Batch Processor. Did you not receive it?

There is even a fully-working example in the corresponding help topic: splicelist:Split char{,Prefix}{,Postfix}{,trimdupes}

Just copy and paste this variable:

{File.MD.hierarchicalkeywords|splicelist:~;|,-keywords=,{cr}{lf},trimdupes}

into the custom metadata option in the Batch Processor and you're done.
This writes all hierarchical keywords in their flattened form into the output file.

You can see the result of this variable in the IMatch VarToy app (open it from the App Manager).

For more info, see Variables and Using Custom Metadata


digi56

Hoi Mario, danke für das schnelle Update.
Ich war am Störungsmeldung wegen den fehlenden Kameradaten erfassen und prüfte noch rasch ob vielleicht schon eine neuere Version verfügbar ist.
SUPER und DANKE !
Gruss Fredi

maxbelloni

Hi Mario,

Quote from: Mario on November 16, 2023, 09:48:22 PMI answered your email about the same question yesterday already, with an image of where to put the variable in the Batch Processor. Did you not receive it?
I checked my mail, but I did not find your message. I'm sorry to still have bothered you  :-[

Now it's totally clear how to do it (and, of course, it's perfectly working!)

Again thanks!

Max

Mario


Mario

I have added 4 more tag sets to the Batch Processor:

1. XMP: Leaf keywords of hierarchical keywords
2. XMP: Hierarchical keyword levels as individual keywords
3. IPTC: Leaf keywords of hierarchical keywords
4. IPTC: Hierarchical keyword levels as individual keywords

For a file with the keywords:

Location|Beaches|Daytona
Motive|Portrait

1. and 3. emit "Daytona" and "Portrait"
2. and 4. emit "Location", "Beaches", "Daytona", "Motive" and "Portrait"

Basically these groups do the same as the variables, just simpler. I hope this will help users with the same requirements.
Let me know if there are other issues.

maxbelloni

Wow! I can't think any other possible combination. I believe that such kind of need is now super complete, even more than (I personally) could need.

Thanks Mario, you're the best!  ;D

Max