New import format in the thesaurus (synonyms)

Started by KimAbel, July 18, 2013, 01:02:21 AM

Previous topic - Next topic

KimAbel

I have been checking the thesaurus a bit. I really like the feature with "Group levels" which let me group my keywords in groups like "Birds", Mammals", "Fish" and so on.

My problem is to get all my keywords with synonyms into IM5. In IM3 I could manage my keywords in exel where each keyword would have all the synonyms on the same row. I could then save it as text and copy it into the IM3 .dat file. One row would then look like this :

"abbor","Perca fluviatilis","art","Animalia","dyreriket","Chordata","ryggstrengdyr","Vertebrata","Actinopterygii","Perciformes","Percidae","abborfamilien"

"abbor" is the keyword and all the rest is synonyms.

IM5 uses another system where I have to use one row for the keyword, and a new row for each synonym. When I am managing large exelfiles with scientific classification systems they are set up with all the info about one species in one row (after my cusomization of this list each species (each row) can have different numbers of synonyms (columns)). To convert this file into the IM5 system is a lot of work. It is much easier with a setup like in IM3.

So my hope is that one more import feature would be added to IM5. The best would be the opportunity to import an exelfile into a chosen "Group level" in the thesaurus. Or perhaps the group level, or several group levels could be specified in the exelfile.

The group levels lets me deal with several smaller groups instead of the whole list. When any change is done in the scientific classification I can then delete the group that needs change and import a new one, instead of the total list.

If this is something worth spending time on I will gladly e-mail you a test version of the exelfile.

Regards
Kim Abel

Richard

Quotea new row for each synonym
Hi Kim,

That is not my understand from reading "The Universal Thesaurus" in Help. You can add a row like:
Perca fluviatilis;art;Animalia;dyreriket;Chordata

See the attachment

[attachment deleted by admin]

Mario

The IMatch 5 thesaurus supports multiple import formats, including the .DAT format used by the IMatch 3 thesaurus.

So if you only need to import all your data once, produce a regular IMatch 3 IMIPTCUSR.DAT as before and import that once in the IMatch 5 thesaurus.

As Richard pointed out, you can add lists of elements or synonyms easily in the thesaurus editor, so if you update your Excel sheet for some reason, you can copy/paste the synonym lists into the thesaurus directly.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

Perhaps it is me that dont quite understand how it works, but when I made me a test group level I can paste in all the keywords I want, but not the synonyms. If I want synonyms I have to select one keyword, and then paste in all the synonyms for that one keyword. I have tried both with the "Use sub-elements" on and off. So how can i paste both a large keyword list with all the synonyms into the thesaurus in one step? How should the text that I want to paste into the thesaurus look like, and how is the settings? I can give you some few example rows here. The first word in the row is the keyword and the rest in the same row is synonyms. A real import for one group level can be several thousand rows.

åkerrødtopp;Odontites vernus vernus;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Lamiales;Orobanchaceae;snylterotfamilien
åkersalat;Valerianella rimosa;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Dipsacales;Valerianaceae;vendelrotfamilien
åkersanger;Acrocephalus agricola;art;Animalia;dyreriket;Chordata;ryggstrengdyr;Vertebrata;Aves;fugler;Passeriformes;spurvefugler;Sylviidae;sangerfamilien
åkersennep;Sinapis arvensis;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Brassicales;Brassicaceae;korsblomstfamilien
åkersjampinjong;Agaricus arvensis;art;Fungi;soppriket;Basidiomycota;stilksporesopper;Agaricomycotina;Agaricomycetes;Agaricales;Agaricaceae;sjampinjonger
åkersnelle;Equisetum arvense;art;Plantae;planteriket;Pteridophyta;Sphenopsida;Equisetales;Equisetaceae;snellefamilien
åkersteinfrø;Buglossoides arvensis;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Boraginales;Boraginaceae;rubladfamilien
åkersteinkløver;Melilotus infestus;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Fabales;Fabaceae;erteblomstfamilien
åkerstemorsblom;Viola arvensis;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Malpighiales;Violaceae;fiolfamilien
åkerstorkenebb;Geranium dissectum;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Geraniales;Geraniaceae;storkenebbfamilien
åkersvineblom;Senecio vulgaris;art;Plantae;planteriket;Magnoliophyta;Magnoliopsida;Asterales;Asteraceae;kurvplantefamilien

I have imported my old keywordlist from IM3 and it works, but I am now thinking about how I best can deal with my keywords and synonyms in the future.

One other thing
After I have imported my old IM3 keyword and synonyms it all works well in assigning keywords and synonyms in the keword panel, but when I am trying to look at the synonyms in the thesaurus I cant find them. Is this right? I would expect synonyms for each line. See the attachment for an example. Each keyword you see there has several synonyms, but they arent visible.

Kim Abel

[attachment deleted by admin]

Mario

Quotecan i paste both a large keyword list with all the synonyms into the thesaurus in one step?

You can't. You can enter/paste multiple keywords at a same time. And you can enter/paste multiple synonyms at a time.
This feature was designed to allow users to quickly enter a batch of new keywords or new synonyms. This is not an import feature for mass data.

Quotebut I am now thinking about how I best can deal with my keywords and synonyms in the future.

Well, most thesauri are pretty stable. Photo agencies and business usually work with fixed thesauri. A few new keywords and celebrity names per month. Easy to handle with the tools available in IMatch. And it needs to be done only once and then the thesaurus can be distributed to other users for import.

Scientific classification systems are another field for large thesauri. But new species are rare and to adding massive amounts of keywords with associated synonyms is not a typical use case. If such is required often, it is usually easier to make these changes directly in a flat file or XML file format supported by IMatch, either manually or via a script / program.

Standard vocabularies in use, e.g. the controlled vocabulary or the Library of Congress vocabularies can be bought/downloaded in formats which can be directly imported into IMatch 5.

Quotebut when I am trying to look at the synonyms in the thesaurus I cant find them.

This is a bug that has been reported for build 5.0.102 and has been fixed for the next build. See the bug report forum for details:

https://www.photools.com/community/index.php?topic=373.0
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

But new species are rare and to adding massive amounts of keywords with associated synonyms is not a typical use case. If such is required often, it is usually easier to make these changes directly in a flat file or XML file format supported by IMatch, either manually or via a script / program.

Usually its not the occurence of new speices that makes this a changing list. It more often new norwegian names and new synonyms that I want to include. Also which species that are changed are not always easy to spot among thousands of record so its usuallly easier to import a new list. Since my scripting capabilities is poor and that the majority of these lists is already in exelformat I would much prefer exel (or text derived from exel) as a import source. As I explained earlier its not a easy task to convert it into the structure that IM5 already supports. Since there already is a function to import from IM3 file (with similar structure) I would think that it only needs modification to make it work in IM5 directly without the need to paste it into the IM3 file first.

I see that this isnt a priorority now during the beta, but I really wish that this would be a possibility later:)

Regards
Kim Abel

DigPeter

Quote from: Mario on July 19, 2013, 08:38:35 AM
Scientific classification systems are another field for large thesauri. But new species are rare and to adding massive amounts of keywords with associated synonyms is not a typical use case. If such is required often, it is usually easier to make these changes directly in a flat file or XML file format supported by IMatch, either manually or via a script / program.
@Mario - what is "massive" in this context?  The total number of taxa in the world is massive.  In UK, the wild flora taxa number some 7000+ . Atleast 50% of these have recognised common names, which could be added to the scientific names as synonyms.   In the botanical field, there are frequent name changes, because of molecular analysis and other activities.  This is a bit of a pain to change in IM3.6 . Whole groups of genera have recently been moved from one family into a different, existing or new family.  This was very time consuming for me to imlement.  I would be interested in any ideas how this could be facilitated in IM5.

Mario

QuoteWhole groups of genera have recently been moved from one family into a different, existing or new family.  This was very time consuming for me to imlement.  I would be interested in any ideas how this could be facilitated in IM5.

To move a thesaurus node (with all sub-nodes and synonyms) to another node (re-parent) I would just copy the node into the clipboard and paste it under the new parent. Then delete it under the original location. Would that work for you as well?

Quotemassive

7,000 entries is quite a lot. But not uncommon. In my experience, there are probably as many formats for thesaurus data as there are applications for it. Every agency or organization seems to come up with their own format. In many cases these are text formats which can be converted into one of the formats IMatch 5 accepts as input.

The thesaurus supports four input formats:

  • The native IMatch 5 XML thesaurus format. This format is the most versatile because it can handle all extended attributes available in the thesaurus.
  • Import of IMatch 3 IMIPTCUSR.DAT files. This is the format in which IMatch 3 stored thesaurus data in the IPTC editor.
  • Import of IMatch 3 categories. For users who want to convert hierarchies they maintained as categories into proper IMatch 5 thesaurus data.
  • Import of a tab-indented text format which is used by some standard applications (e.g. LR) and many of the controlled vocabularies out there.
Especially the last format is very easy to produce "by-hand" or from all kinds of sources.

I have currently no plans to add an Excel import format. The OT may use a format where the keyword and synonyms are all in one row. The next user will use a format which splits keywords and synonyms into multiple rows, uses different separators. Or even multiple worksheets with links. Writing an all-purpose Excel import will be quite a bit of work, for maybe a handful of users who would have a use for such a feature.

Better to implement that elsewhere and produce one of the four input formats supported by IMatch 5. Or implement it as an purpose-built IMatch 5 script which understands the specific Excel format used by one user, and then produces an input file for IMatch 5 from that.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

DigPeter


KimAbel

QuoteTo move a thesaurus node (with all sub-nodes and synonyms) to another node (re-parent) I would just copy the node into the clipboard and paste it under the new parent. Then delete it under the original location. Would that work for you as well?

This wont help because it is the scientific name (or in my case the norwegian name) that is the keyword and the rest is synonyms. To make such an change I must change the synonyms for each keyword. This must then be done one time per species for all the species that are changed. I cant see any shortcut of handling the changes that comes from time to time.

QuoteEspecially the last format is very easy to produce "by-hand" or from all kinds of sources.

If this is true then I can agree that it is no need for any new import format. So if there are any one that can point me to the right tool or method to go from this

åkersnelle;Equisetum arvense;art;Plantae;planteriket;Pteridophyta;Sphenopsida;Equisetales;Equisetaceae;snellefamilien

to this

åkersnelle
       Equisetum arvense
       art
       Plantae
       planteriket
       Pteridophyta
       Sphenopsida
       Equisetales
       Equisetaceae
       snellefamilien
(the existing IM5 text import format)

I would be very happy :-) The number of synonyms is not constant for all species. Some have a few and other can have many. The number of records to change would be many thousands.

QuoteI have currently no plans to add an Excel import format. The OT may use a format where the keyword and synonyms are all in one row. The next user will use a format which splits keywords and synonyms into multiple rows, uses different separators. Or even multiple worksheets with links. Writing an all-purpose Excel import will be quite a bit of work, for maybe a handful of users who would have a use for such a feature.

It dont have to be a exel import. Its enough to use a plain text import in the form of one row for each entry. This is easily dervied from exel and the separator could you specify one time for all. Like you have done in the IM3 .dat file. With exel I can combine almost all kinds of data and its a very powerful tool for combining these lists. The most versatile form is one row for each keyword and its synonyms. When the list contains from a few hundred to many thousands records the list is much more easier to work with in exel instead of hand typing the existing IM5 format. In exel I can use formulas to import other synonyms into my list from other sources as for instance thretened categories from the IUCN Red List of Threatened Species. I would also guess that many more keyword lists are derived from exel besides the scientific classification system.

QuoteBetter to implement that elsewhere and produce one of the four input formats supported by IMatch 5. Or implement it as an purpose-built IMatch 5 script which understands the specific Excel format used by one user, and then produces an input file for IMatch 5 from that.

A good solution, but I dont have the knowledge to do this. If anyone sees the need for this I would be very happy if such a script was written. I dont think it is neccessary to do this complicated and make room for many customizations of the import format since its easy to adjust the exel file and the resulting text file as long as it is based on one row for each record.

I will make this my last attemt of trying to explain the benefits of an import in the form of "one row per record" as I totally see that you have to make priorities of what you should spend your time at. I may be one of a handful that would appreciate this, but my guess is that quite a few people would use this feature (import of keywords and synonyms directly into a group level from a text file in the form one row per record).


Regards
Kim Abel

ChrisMatch

#10
Quote from: KimAbel on July 20, 2013, 01:38:16 AM
QuoteEspecially the last format is very easy to produce "by-hand" or from all kinds of sources.

If this is true then I can agree that it is no need for any new import format. So if there are any one that can point me to the right tool or method to go from this

åkersnelle;Equisetum arvense;art;Plantae;planteriket;Pteridophyta;Sphenopsida;Equisetales;Equisetaceae;snellefamilien

to this

åkersnelle
       Equisetum arvense
       art
       Plantae
       planteriket
       Pteridophyta
       Sphenopsida
       Equisetales
       Equisetaceae
       snellefamilien
(the existing IM5 text import format)

Isn't that a simple search and replace where ; is replaced with <newline><tab>? Such a transformation can be done in editors like notepad++
(If I understood you right, I already did this as a little test - took only a few seconds).



[attachment deleted by admin]

KimAbel

That solved it almost for me :-) Thank you very much. I was consentrated on doing all of the work in exel first and did not see that solution.

I see now that the resulting list in IM5 is a hiarchical keywordlist. The thesaurus dont explain how the file for synonyms should look like. I have tried to look at the resulting file from a text export, but that looks strange since all the synonyms are on the same level as the keywords. Perhaps this is related to the bug https://www.photools.com/community/index.php?topic=373.0 :

[A]
   Aagaardia protensa
   art
   Animalia
   dyreriket
   Arthropoda
   leddyr
   Insecta
   insekter
   Diptera
   tovinger
   Chironomidae
   fjærmygg
   Aagaardia sivertseni
   art
   Animalia
   dyreriket
   Arthropoda
   leddyr
   Insecta
   insekter
   Diptera
   tovinger
   Chironomidae
   fjærmygg

The other thing that I have yet not solved is how to get the list that I am importing into the group level that I want. I can import the list, but in order to get that list into the group level I have to move each entry manually.

I see that I can specify the group level in the import text file, but I have not managed yet to use the search and replace function to make it look as needed.

Is it possible to allow an import into a selected group level?

Regards
Kim Abel

Mario

The flat text format has no knowledge of synonyms. Every line indicates a keyword, and the number of tabs indicates the level. This is the format used by LR and others and they don't know about synonyms.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

So then there is no way of importing synonyms via the text import?

Kim

KimAbel

Lightroom also deals with synonyms. This list can be created in a texteditor and you can see an example here:

http://lightroom-news.com/2009/05/04/keyword-list-creation-outside-lightroom/

Kim

Mario

Good info, thanks. I did not know that LR introduced that.

I may look into adding this as an enhancement in a later version of IMatch 5. I need to read that article and then decide when to implement this.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

Good Mario. Please consider an export directly into a chosen group level, since there is no option to move many entries in one step.

Regards
Kim Abel

Mario

-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

In the help file under "The Universal Thesaurus" you have written

"Using Group Levels
This is where group levels come into play. When you enable this property for an element in the thesaurus, IMatch does not include this element anymore when producing keywords from thesaurus entries."

Thats the group level I am referring to :-)

Then I can make me a "Group level" called "Mammals" and import my keyword and synonyms list with all the mammals. This gives me a very good opportunity to have a nice and structured thesaurus. As I explained earlier I know that the text import lets me specify a group level in the text file (hiarchical keywords), but this makes the text file quite a bit more complicated to make. At least my attempts in making this work with a large file did not succeed.

Regards
Kim Abel


Mario

All groups start as regular keywords. You add the keyword Mammals and then set the option in the Thesaurus Manager to mark it a group.
In the text import format, group levels need to be enclosed in [] => [Mammals]. If you do that, IMatch will automatically convert Mammals into a group level on import. Did you try that?
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

I tried that, but when including the group level in the exel spreadsheet the conversion to the text import format got more complicated. Two levels (keyword and synonyms) was quite easy to handle, but not three (group level, keywords and synonyms). Alternatively you could allow the user to select more than one entry in the thesaurus and drag and drop them to one group level.

Regards
Kim Abel

DigPeter

#21
Quote from: Mario on July 21, 2013, 08:56:00 AM
All groups start as regular keywords. You add the keyword Mammals and then set the option in the Thesaurus Manager to mark it a group.
In the text import format, group levels need to be enclosed in [] => [Mammals]. If you do that, IMatch will automatically convert Mammals into a group level on import. Did you try that?
But when a node is marked 'Group', at present that node is also excluded from the Category panel.  That surely is not intended?
See thread https://www.photools.com/community/index.php?topic=387.0

Ferdinand

Quote from: DigPeter on July 21, 2013, 10:07:19 AMBut when a node is marked 'Group', at present that node is also excluded from the Category panel.  That surely is not intended?

Yes, it is intended.  If a node is Group, it is not written to the file and so does not appear in @Keywords, since they reflect what is actually in the file.  I will reply to your other thread shortly.

DigPeter

Quote from: Ferdinand on July 21, 2013, 12:11:07 PM
Quote from: DigPeter on July 21, 2013, 10:07:19 AMBut when a node is marked 'Group', at present that node is also excluded from the Category panel.  That surely is not intended?

Yes, it is intended.  If a node is Group, it is not written to the file and so does not appear in @Keywords, since they reflect what is actually in the file.  I will reply to your other thread shortly.
If that is what is intended, it is not, as I recall, what we were asking for back in pre-beta and which I think was implemented initially.  I definitely want all nodes in @keywords but not top ones and some others in flat keywords.

Ferdinand

Quote from: DigPeter on July 21, 2013, 05:56:26 PMIf that is what is intended, it is not, as I recall, what we were asking for back in pre-beta and which I think was implemented initially.  I definitely want all nodes in @keywords but not top ones and some others in flat keywords.

Then you want Exclude, as you will see from my reply to your other post.

Mario

@KimAbel

I have added support for synonyms in the text import and export formats. When you export your thesaurus as text, synonyms are now indented and included in curly braces {}. On import, elements in {} are considered as synonyms and added as such.

This change will be shipped with build 5.0.104.

If you want you can send me a sample file you have prepared and I will import it as a test
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

Good news:) Looking forward to testing it.

Is import into a group level also in this release, or perhaps later?

I will make you a test list as soon as possible, but i am going away for a mountain hike for a few days first.

Regards
Kim Abel

Mario

QuoteIs import into a group level also in this release, or perhaps later?

As I said above, if you include keywords in [] in your import files, they will become groups automatically. This is already implemented in the version you have.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Ferdinand

Is there scope to request that there also be a way to mark Excluded nodes in import files?

Mario

This is supported in the native IMTHS format, which also supports all other attributes and properties of thesaurus nodes.

The text format has no notion for exclude. This format is supported by a range of applications and I cannot introduce changes to it.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

DigPeter


DigPeter

Quote from: Mario on July 22, 2013, 08:36:58 AM
This is supported in the native IMTHS format, which also supports all other attributes and properties of thesaurus nodes.

The text format has no notion for exclude. This format is supported by a range of applications and I cannot introduce changes to it.
Thanks - my post crossed with yours.

KimAbel

QuoteAs I said above, if you include keywords in [] in your import files, they will become groups automatically. This is already implemented in the version you have.

Hello Mario

I have just tested a small text import file and it works :-) The only thing is that my group element are not marked as group element in the thesaurus  after the import. In the attachment you can see the testfile that I used where I created a new group level called "Pattedyr". I unchecked both the exclude options in the import thesaurus dialog box.

QuoteTwo levels (keyword and synonyms) was quite easy to handle, but not three (group level, keywords and synonyms). Alternatively you could allow the user to select more than one entry in the thesaurus and drag and drop them to one group level.

Thats the reason for my wish to import directly into a group level, or a possibility to drag and drop multiple thesaurus elements, but I can understand that you at this time dont want to spend any time on it. I will just leave my wish here and hope for a later improvement. The thesaurus would be much easier to organize with this option.

Regards
Kim Abel

[attachment deleted by admin]

Mario

QuoteI unchecked both the exclude options in the import thesaurus dialog box.

These options need to be on to convert thesaurus [entries] into group levels on import:



I've imported your file and this is what I get:



The only other thing I did was to correct the format of the file. Your file uses duplicate carriage-return line feed pairs. Only one carriage-return/linefeed should be used.

[attachment deleted by admin]
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

Ok.

Then its the norwegian translation on these two checkboxs that are wrong. They says

"Exclude elements in [and]."
and
"Exclude elements in CAPITAL LETTERS".

I think I will revert to the english version until the official IM5 comes out :-)

Kim

Mario

I think I will revert to the english version until the official IM5 comes out :-)

This will not help. IMatch is translated into other languages by volunteer users. This is an ongoing effort and I cannot say if and when a complete Norwegian translation will become available. I'm always looking for volunteers, but only a few users responded  :-X

This little glitch may have crept in when I changed some functionality,


Please translate:

Mark elements in [ and ] as group level

and

Mark elements in UPPER CASE as group level

into Norwegian.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ovrevid

I have fixed the erroneous translation.

@Kim: Please continue to use the norwegian versjon. If noone uses it we can't make it better and errors will go unnoticed! If you find other errors, bad/wrong wording, phrasing or whatever, feel free to PM or email me about it, or you can create a thread in the Translation and Localization board. Initially we vere two translators, Vemund and myself, but Vemund hasnt't been active for quite some time so maybe he has "left the building"(?). I know that he used some form of automatic or semi-automatic tools for translating and I have seen that this sometimes have been the reason for "funny" translations.

There are still hundreds of untranslated strings in IMatch, but although my time is limited we will get there in the end  :)

br
Vidar
-- Vidar

Mario

Thanks, Vidar.

We're still looking for volunteers helping with the IMatch translation. Or who look over my original English and Deutsch resources  ::)
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

KimAbel

Quote@Kim: Please continue to use the norwegian versjon.

Ok Vidar. I will inform you about errors:)

Kim

picolo

QuoteOr who look over my original English and Deutsch resources  ::)
I can look over it as time permits...
Cheers, Michael
__________________________________________
Intel i7 | 8GB | ATI HD5770 | OS: Win8 (64 Bits)
http://picolo-photography.com