Thesaurus considerations in a changing world of science, nations and more... ;)

Started by MrPete, August 26, 2024, 06:33:11 PM

Previous topic - Next topic

MrPete

(Ugh. Just lost most of what I'd written. GOT to learn to write elsewhere and paste online!)

THis is NOT a Feature Request... I don't even know what feature(s) I'd request at this point.

This is a post to highlight some observations based on what I'm learning from users here, from Mario, and from my own background.

The current design for the Thesaurus is, as Mario has said: "Once set up, a thesaurus should rarely change. That's the idea of a controlled vocabulary. A fixed, rarely changing, set of keywords to allow for a quick and consistent keywording experience."

As noted by @rolandgifford and others, and as I note from interactions with some experts in my wife's birding group (she has some amazing resource people available!)...

Scientific taxonomies are mostly stable. Yet a small number of changes are regularly seen. Bird taxonomy is one of the MOST stable, yet as seen in the eBird/Clements taxonomy...
  • It is updated annually (every October).
  • While brand new species are rarely added, every year there are splits and lumps. AFAIK these are often due to new discoveries via DNA sequencing. But they cause the list to change.
  • Each new release has references to the proor year list, including old/new code links. So there is some hope of being able to auto-merge or at least discover which entries are effected.
  • NOTE: there are both unique and "quick" (not unique but easy to use) codes for bird species. These codes tend to be more stable than bird species info.


My own professional work includes names and codings systems for geography (countries, provinces, etc), languages and more. The realities there are similar to scientific taxonomies:
  • Name and coding systems are mostly stable. Countries/Provinces are ISO 3166; languages are ISO 639 (-1, -2 and -3)
  • However, there have been changes -- at least one every year in the last few decades except 2001 and 2006 IIRC.
  • In addition to splits and lumps, there are also simply brand new nations on occasion.
  • A bit parallel to "common" and "scientific" bird names, there are official internal names (aka "endonym" - name they use themselves) and names used by others ("exonym").
  • Also of note: while names tend to change frequently, the codes are normally much more stable. The big deal geographically is that province codes -- and maps too -- were considered proprietary in many places until recently.

For those who stay in the "western" world, this might seem pretty obscure. After all, other than the breakdown of the Iron Curtain ~30 years ago, isn't everything stable? :)

ANYway. From where I sit:
  • It's true that Taxonomy "trees" are reasonably stable
  • Yet updates ARE needed pretty regularly for these big taxonomies (13000+ bird species, 3800+ 1st/2nd level geo)

I can see the simplicity of having no direct connection between Thesaurus and Keywords. I LOVE the simplicity of deleting part of a Thesaurus tree, and importing an updated version. Awesome.

What else would help to:
  • Verify that correct elements are in use, in general?
  • Identify elements that are subject to an upcoming split/lump?
  • Perhaps mark Thesaurus elements designed for user auto-fill (eg codes) vs those used to set keywords or other metadata?

That's all I have for the moment. More after I've made more progress.

As always, THANK YOU for a wonderful system -- and community!

rolandgifford

I now have a process which has worked for recent bird taxonomy updates prompted by suggestions from Mario

1. Create a text file with the new taxonomy. This is reasonably straightforward for birds as the taxonomy is published in Excel format and getting it into the correct layout isn't very difficult.

2. Delete the old taxonomy branch in the Thesaurus and import the new one.

3. Export the full Thesaurus to text file one

4. Save the new Thesaurus

5. Import Thesaurus from Database. This will add any species that I have allocated Keywords which are no longer in the Thesaurus

6. Export this thesaurus to text file two but don't save it.

7. Compare text files one and two. Any species which is in file two but not file one needs to be fixed. This may be the same species name but moved to a different family. It may be a renamed species. It has to be done manually.

8. Any other difference don't need any attention, the only ones that need fixing are the ones that have been used

MrPete

Quote from: rolandgifford on August 26, 2024, 08:05:44 PMI now have a process which has worked for recent bird taxonomy updates prompted by suggestions from Mario
5. Import Thesaurus from Database. This will add any species that I have allocated Keywords which are no longer in the Thesaurus

YOWZA! That is the key feature. A little obscure, but YES, that essentially accomplishes the needed "diff" function :) 

Now, to learn about options for alternates/aliases... I'm hoping I can set up so she can use the 4-letter Quick Codes so common in eBird etc.

rolandgifford

Quote from: MrPete on August 26, 2024, 10:12:52 PMNow, to learn about options for alternates/aliases... I'm hoping I can set up so she can use the 4-letter Quick Codes so common in eBird etc.

I don't use alternates/aliases so can't help with those but the Clements/eBird spreadsheets presumably contain them (they seem to be up to 7 characters, I use IOC so no clue) so creating a text file for import with those as well should be doable.

Download from here as you probably already know https://www.birds.cornell.edu/clementschecklist/introduction/updateindex/

Mario

I've added an improvement to the thesaurus for the IMatch 2025.

The Thesaurus now tracks which keywords were added when you run the "Import from Database" command and displays the number of imported keywords when completed.
It also copies the paths of the added keywords into the Windows clipboard, which makes it easy to review what was added aka the "diff". You can just paste it from there into Windows Notepad or Excel or whatever.


Image1.jpg

rolandgifford