How to eliminate identical keywords and add all synonyms from the path?

Started by LRAT, May 11, 2020, 05:30:24 AM

Previous topic - Next topic

LRAT

Hi,
Currently I am evaluating IMatch as we speak and so far it has got the potential for what I was looking for.
I have to admit, it's a steep learning curve but I like the flexibility of the software very much.
The main reason why I am interested in IMatch is because I need to add keywords to about 30,000 pictures.
This is a big job and I want to develop the best strategy possible before I jump into the deep end.

The biggest hurdle I've got is as follow:

I use a structural keywords list (Which I developed myself). There are about 2,000 words in this list. I have many words that I have given a synonym to.
However, whenever I select a keyword the synonym will also be added (As intended) but comes also with the full path. This leads to duplication of keywords:

Example:
From my keyword list I look for "Giant tortoise", the synonym for Giant Tortoise is "Testudo Gigantea".
So I would expect to see something like this in the keyword list:
~Animals|Wildlife|Reptiles|Turtle|Tortoise|Giant Tortoise|Testudo Gigantea

However, what I see is this:
~Animals|Wildlife|Reptiles|Turtle|Tortoise|Giant Tortoise;
~Animals|Wildlife|Reptiles|Turtle|Tortoise|Testudo Gigantea

In this string there are five (Or six, if you look closely) words which are identical.
The question is: How can I get rid of these identical words?

Also, along the same process a couple of synonyms are eliminated. I explain:
For "Wildlife" I've got a synonym "Undomesticated animals"
For "Reptiles" I've got a synonym "Reptilia"
For "Tortoise" I've got a synonym "Testudinidae"
All these synonyms have been stripped out of the string and the only synonym added is the very last one.

So, the string I would like to see is as follow:

~Animals|Wildlife|Undomesticated animals|Reptiles|Reptilia|Turtle|Tortoise|Testudinidae|Giant Tortoise|Testudo Gigantea

If one of the words "Tortoise" would also disappear than I wouldn't care about it as the the two words are still showing up in the string: Giant and Tortoise, which will end up with the same result when I do a search for "Giant Tortoise".

I have attached a screen shot of all this.

Could somebody please tell me what I need to do to achieve all this?

Many thanks in advance.

Luc

Mario

QuoteHowever, whenever I select a keyword the synonym will also be added (As intended) but comes also with the full path. This leads to duplication of keywords

This is exactly the intended behavior.
None of the established keyword standards has a notion for synonyms.
There are just keywords.

Synonyms in IMatch allow you to assign one keyword to a file, and then let the system add one or more other keywords (synonyms). Of course the synonyms must be on the same hierarchical level, else the synonyms would produce other keywords.

When assigned, the synonyms become regular keywords, suitable for exchange with other applications, clients, and services.



LRAT

Thanks Mario for your fast response!

OK, if that was the intend than what happened to the other synonyms stripped out along the path?
In the above mentioned list I got other synonyms like: "Undomesticated animals", "Reptilia" and "Testudinidae". These are not included in the generated keyword string.
Also, if there's like multiple synonyms to a keyword than the program only picks one synonym and disregards the rest.
This doesn't make much sense to me. Could you please elaborate on that and explain your thought-process?
No disrespect intended and thank you!
Cheers,

Luc

Mario

IMatch does not assign "synonyms along the path" when you assign a keyword.
It assigns the keyword and the synonyms of that keyword.

Each keyword is a unique endpoint, so to speak.
location|beach|Daytona is one keyword. Only synonyms of Daytona are assigned when you assign this keyword.
IMatch does not assign synonyms of other keywords, e.g. synonyms you might have added to location or beach. Because

location
location|beach
location|beach|Daytona

are all different keywords.

graham1

I thought I would reply to this, since it sounds as if Luc may perhaps have similar uses and issues to mine.  I apologise to Mario in advance: I know that you do not like long posts, but if this helps a prospective buyer, I am sure you will not mind too much!

1.  Firstly, I can see why Luc's string "Animals|Wildlife|Undomesticated animals|Reptiles|Reptilia|Turtle|Tortoise|Testudinidae|Giant Tortoise|Testudo Gigantea" cannot work.  "Tortoise" and "Testudinae" are at the same level, and "Giant Tortoise" cannot be at a single next lower level to each of them at the same time, hence the two strings.

2.  Secondly, I have previously raised the issue of synonyms higher than leaf level not being included.  Mario has explained, however, that there is no inherent difference between a keyword and a synonym, except in the way this is seen in the database.  Therefore, to include synonyms above leaf level as keywords themselves can only confuse the database (I hope I have understood and explained this correctly).  The above example makes it clear that the inclusion of "Testudinae" in a single output would be confusing/incorrect.  I will explain below how I deal with this.

3.  One result of including only leaf level synonyms is that you cannot directly search for images containing synonyms at higher than leaf level.  I find that I generally have the most memorable word as my keyword and something less memorable as the synonym (such as a Latin name), which means I do not need to search for synonyms.  But in any event, there is an easy way round this: in the filter bar below the keywords/synonyms, enter the synonym you are searching for.  This will bring up a list of all thesaurus entries containing this synonym.  Right click on one of these and select "Go to keyword".  You can then immediately see the keyword to which the synonym has been applied, and then search for that keyword.

4.  That is all well and good for images within the database.  My issues have centred around the attribution of keywords and synonyms to images that I prepare for use outside the database.  I have a large collection of images, mainly RAW files, which I generally roughly keyword as I go along.  I submit to stock agencies, for which I need just flat keywords (and potentially all synonyms at all levels, not just leaf, included as flat keywords).  I refine my keywords as I prepare individual images for submission. 

IMatch is of course a DAM, not a RAW editor, so the files have to be sent to my RAW editor of choice, Lightroom or Capture One.  From the editor I save or export the resulting JPEG file (or other output) to a separate location: I do not need to keep it in my DAM, although I could include it as an IMatch version if I preferred.  So either on the way out, or in the new file location, I need to make sure that all synonyms are allocated.  That is easy.  In the Keyword Panel, right click the  keywords you are interested in, and select "Locate in Thesaurus".  That will then show the leaf keyword ticked.  Then simply click all superior keywords to select them, and then their synonyms along the entire path/branch will be included.  I have requested a simple button or similar to facilitate checking all levels from leaf upwards, but Mario has concerns about its implications for the database, so this has to be done manually for each leaf keyword (I would still like this option, Mario!).

One issue is that for agency submission, I need just flat keywords, and not the hierarchical versions with the pipes between.  Once you have written metadata to file, you will see, in the Keywords section of the Metadata Panel, the hierarchical keywords (editable, but with their pipe separators) and (depending on how you have configured it) the flat IPTC and/or XMP keywords (including your synonyms), which are not editable, but can be copied by right clicking.  To get rid of the unwanted hierarchy separators and duplicate keywords, right click on the group of flat keywords, copy, and then paste into the hierarchical keywords panel (having selected the piped hierarchical keywords first so that they are overwritten).  Then you are left only with your flat keywords, comma or semi-colon separated, duplicates automatically deleted, ready for upload to the agency.  And if you go back to the Keywords Panel, you can re-order the keywords as you like, unconstrained by the former hierarchy order (which is essential for my agency submissions).

In this way, you would end up with "Animals,Wildlife,Undomesticated animals,Reptiles,Reptilia,Turtle,Tortoise,Testudinidae,Giant Tortoise,Testudo Gigantea", and you could re-order them so that the  most important keywords come first, not last as the hierarchy dictates.  Your database and keywords viewed through the IMatch database would still show the multiple entries to which you refer, but your images that you are sending elsewhere outside IMatch will have the keywords (including synonyms) you want, in the order you want, without duplication.

I usually do the above by exporting my processed images to the a new file location, which is the subject of a separate small, ever-changing catalogue, as batches of images are uploaded to the agency and then archived.  This avoids any risk of causing damage to the main database.  It sounds complicated, but it is not, and is very simple once you get used to it.  The ability to delete unwanted keywords or synonyms in the Keyword Panel is streets ahead of Lightroom (where in effect you have all the synonyms and all the intermediate keywords whether you like it or not).

It sounds to me as if your requirements might be similar to mine.  I thought, therefore that I would set out my workflow, in the hope that it helps with your evaluation of IMatch.  It took me some time to work all this out, but now that I have, it works like clockwork.

I hope this helps (and, Mario, apologies for the length of this post!).

Graham



Mario

I have absolutely nothing against long posts.
Especially not if they are helpful like this.
And if the user remembers to add a new paragraph once in a while  ;)

Sometimes users write very long posts in one, very long paragraph. And this makes the post virtually unreadable for users following this community on smaller screens or mobiles - like myself occasionally. This is when I write a note and ask the user to split his/her long post into better-readable paragraphs the next time.

Jingo

We've discussed this topic a few times in the past... there is even an old Feature Request out there I believe....

https://www.photools.com/community/index.php?topic=9125.msg64304#msg64304

I believe at some point I started to write an APP to support this type of feature... allow you to go back "x" keyword levels and add synonyms from those keywords as well.. but thinking of potential metadata issues, I pushed it off.... but, I can always revisit if there is enough interest and the integrity of the DB can be maintained.

graham1

I would definitely be interested in such an app.  I have no idea myself how to even start going about writing one: maybe learning something about this should be my next project during lockdown 😷

Graham

Mario

@Jingo

If you do such things, keep in mind how keyword import works, and re-mapping via the thesaurus during import.
If you add arbitrary synonyms 'along the path', you are actually adding additional keywords. And during re-import, these keywords will be mapped via the thesaurus, adding the linked keyword and all other keywords.

I would think very carefully before doing such things outside established standards. Metadata is a mess as it is and I can already envision the support requests for 'keywords being added all over the place after write.back'.

It is important to keep things simple.
If you want x synonyms to be added when you assign keyword K, add all these synonyms to K.
All intermediate keywords leading to K are just segments in the keyword path, they lead to K, but have no function.

Even if some of these synonyms also exist for other segments along the path. A synonym for location|outdoor is different from a synonym for location|outdoor|beaches. Because there are no synonyms, just keywords.

Assigning synonyms from location|outdoor|beaches and location|outdoor|beaches|Daytona when you assign the keyword location|outdoor|beaches|Daytona to a file will cause the keyword location|outdoor|beaches being added during import and thesaurus re-mapping of synonyms. This is most likely not what is wanted here.

Jingo

Quote from: Mario on May 12, 2020, 07:32:15 PM
@Jingo

If you do such things, keep in mind how keyword import works, and re-mapping via the thesaurus during import.
If you add arbitrary synonyms 'along the path', you are actually adding additional keywords. And during re-import, these keywords will be mapped via the thesaurus, adding the linked keyword and all other keywords.

I would think very carefully before doing such things outside established standards. Metadata is a mess as it is and I can already envision the support requests for 'keywords being added all over the place after write.back'.

It is important to keep things simple.
If you want x synonyms to be added when you assign keyword K, add all these synonyms to K.
All intermediate keywords leading to K are just segments in the keyword path, they lead to K, but have no function.

Even if some of these synonyms also exist for other segments along the path. A synonym for location|outdoor is different from a synonym for location|outdoor|beaches. Because there are no synonyms, just keywords.

Assigning synonyms from location|outdoor|beaches and location|outdoor|beaches|Daytona when you assign the keyword location|outdoor|beaches|Daytona to a file will cause the keyword location|outdoor|beaches being added during import and thesaurus re-mapping of synonyms. This is most likely not what is wanted here.

This advice sounds VERY familiar and is probably the reason I didn't do it last time..  8)  I'm going to hold off for now - too many unknowns and gotchas with the metadata re-import. 


Mario

Quote from: Jingo on May 12, 2020, 07:47:59 PM
This advice sounds VERY familiar and is probably the reason I didn't do it last time..  8)  I'm going to hold off for now - too many unknowns and gotchas with the metadata re-import.

I agree. All the luxury we have with keywords, synonyms and the thesaurus exists only within IMatch.
Outside, there are only keywords. And only flat, non-hierarchical keywords, are really standard. Hierarchical keywords are supported by many applications, but not by all. And not by all in the same way. And definitely not by many clients, agencies, web sites, services and systems often used in scientific or institutional environments.

Hence, keeping things simple is usually the best way to ensure optimal interoperability. Which should be a goal when it comes to dealing with metadata.

IMatch categories, however, are much safer to use for doing fancy things - because they don't need to be mapped or broken down to the most basic level understood by all participating applications and services. They can be mapped, in a controlled way, to keywords if so desired at some point. Or converted, mapped, bridged, 'exploded', splashed etc.

LRAT

Thanks to all for your contributions.

Please allow me to explain a bit more:

In my daily workflow I use PhotoMechanic (PM from now on). I love the program as it is great to ingest pictures and go through them quickly.
It also has a hierarchical keyword list. (The one I use in IMatch is the imported structured keyword list which I designed in PM).
PM allows multiple synonyms along the path to appear and doesn't add identical keywords and/or strings. That keeps it nice and clean in my opinion.
What I don't like about PM is the repetitive clicking to add keywords. That was the reason why I started looking into other software and I came across IMatch.
I like the the GUI for adding keywords and also the most frequent used keywords function and favorites. It definitely speeds up the workflow when adding keywords.

Another reason why I like the idea of adding synonyms along the path is when one wants to add alternative spelling or languages. In IMatch these will become lost.
Eg. The correct spelling is "Keyword" but some people might write "Key word" (I know, but this is just an example). If I would add "Key word" as a synonym to "Keyword" than the search function would deliver the same outcome regardless the spelling. In IMatch, this means I would need to create multiple keyword entries which adds even more to the work load.

This sort of function is more common than you might think. Let me explain:
Originally I am from Belgium (But migrated to Australia more than 20 years ago). In Belgium we got three national languages: Dutch, French and German. So, if I would like to add the name of a town than it would have three versions to it's name. Let me show you an example: In Dutch we call the town "Luik", in French it would be "Liege" and in German it would be "Lutich". So if I could make a keyword "Luik" with the two synonyms "Liege" and "Lutich" than it wouldn't matter which language you would use as your search would bring up the same result. In IMatch I would need to type in "Luik" otherwise the search would end up with no results.
This scenario is also valid for local names: For example: Uluru in Australia is also known as Ayer's Rock. Kata Tjuta is better known as The Olgas, etc. I think you can catch my drift.
This is the reason why it is important for me to have the synonyms attached to the keywords along the path.

If only Mario and the PM team could come together and incorporate the best of both software packages.

Most likely I might purchase IMatch after the evaluation period is over as it has got great features that I like. However, most likely I will need continue using PM to add the keywords.

Thanks for your time!

Mario

QuoteIn IMatch, this means I would need to create multiple keyword entries which adds even more to the work load.

Keyword: "Keyword"
Synonyms: "Key Word", "Key-word", "Keywords", "Schlüsselwort", "mot-clé" ...

I don't understand the problem you are facing...

IMatch does not add keywords (or synonyms) along the path.
Adding the keywords

WHERE
WHERE|Location
WHERE|Location|Outdoor
WHERE|Location|Outdoor|Beaches
WHERE|Location|Outdoor|Beaches|Daytona

when you add the keyword "WHERE|Location|Outdoor|Beaches|Daytona" to a file makes not really sense. You add these implicitely, as hierarchy levels.
For synonyms, the same rule applies. As I said, no metadata standard covers synonyms. It's all keywords in the end.

For example, if "WHERE|Location|Outdoor" has a synonym "Draussen", which keyword would this produce when you assign "...Daytona"?
Does this produce the keyword "WHERE|Location|Draussen", or "WHERE|Location|Draussen|Beaches|Daytona", or ...?

If PM handles this differently and you like it more, use PM to add keywords.
As long as PM writes standard hierarchical keywords and synchronizes them into flat XMP keywords and optionally flat legacy IPTC keywords and updates the image file and XMP sidecar file for RAW files with the data, all should be well.

Just make sure that your thesaurus in IMatch reflects the PM structure you use so IMatch knows how to map the keywords found in XMP and IPTC back into your hierarchy.
I don't know how PM flattens hierarchical keywords into XMP and IPTC, so I cannot give you detailed tips for this.

graham1

I understand why it is not possible/sensible to include all synonyms along the entire branch as keywords, only those at leaf level.  I have recently come across an issue which leads me to ask whether it is possible to include no synonyms at all, not even any at leaf level?  My issue arises from dealing with a number of files with differing keywords, some of which have synonyms at leaf level while others do not.  Is there any way to tell IMatch not to include any synonyms at all as keywords?  If not, would it be appropriate to ask for this as a feature request?

The nearest I can find in the help files is that synonyms are not included if a keyword already in the thesaurus is typed into the Keywords panel, but typing in keywords is not a practical solution for my purposes.

Graham

Mario

When a keyword has synonyms and you add that keywords to a file (in any way), the keyword and the synonyms will be added (duplicates are avoided).
That's how this is supposed to work.

Did you consider to clean up your keywords using the advanced features provided by the @Keywords Category?
Removing one or more keywords from multiple files, exchanging keywords etc. is very easy with that.
There is a dedicated tutorial video for @Keywords in the IMatch Learning Center: https://www.photools.com/imatch-learning-center/