A Thesaurus Merge resulted in a thesaurus overwrite

Started by ColinIM, July 26, 2019, 02:48:15 AM

Previous topic - Next topic

ColinIM

On both of my computers I am running the latest IMatch version 2019.6.2 on Windows 7 x64.

Before I raise a bug report Mario, could you confirm please whether or not my (attached) Thesaurus .imths files are somehow faulty?

For example, could my use of the '#' character to label my Group-level keyword entries be a part of my problem here, even though the '#' characters are used (or they were used) only within my current thesaurus, and were not being used in the smaller thesaurus file that I had attempted to merge into it?

MY SYMPTOM 1:
My attempt to merge a recently exported .imths file into my 'main' Thesaurus resulted in an overwrite of my 'main' computer's Thesaurus,
causing the loss of (at least) all of my existing Thesaurus's 'Group' entries.

MY SYMPTOM 2:
(although this is a minor concern compared to my inability to merge my thesauruses / thesauri ...)
I attempted today (Thursday) to re-populate the 'current' Thesaurus with some of the existing metadata in my database, using Thesarus Manager's "Import from database" option.  This appeared to have no effect on my existing thesaurus, or at least it did not re-create any of the Keyword Group structure that had existed prior to my failed thesauraus merge, described in 'SYMPTOM 1' above and in my notes below.

Cross referring to my notes below, I have attached two IMatch LOG files (zipped) both taken while in debug mode.
(Yes, I'm content to suffer whatever peformance-hit comes from always running IMatch in debug mode.)

The first IMatch LOG file is from Saturday 20 July when I first merged the .imths file from my QTEST_PC into this 'main' PHOTO_PC's thesaurus:
2019-07-20d IMATCH6_LOG as at 21.55 on 20 July.TXT.zip

The second IMatch LOG file is from earlier today (Thursday), after I'd attempted unsuccessfully to re-populate the Thesaurus with some of the existing metadata, using Thesarus Manager's "Import from database" option:
2019-07-25b IMATCH6_LOG as at 21.04 on 25 July.TXT.zip

Here are the steps I took to get to this point.

STEP 1.
In IMatch on my 'test' computer (I'll call it my "QTEST_PC") I had extended its thesaurus to include a significant "Linnaeus" type keyword structure (with Family | Genus ... etc.).

I then wanted to merge that Linnaeus keyword structure into the thesaurus on my 'main' computer (which I'll label here as my "PHOTO_PC").

On my QTEST_PC I exported its thesaurus as an imths-format file and (for quick reference) also as a text file.
I've combined those two files below, inside the ZIP file called:
zip1-2019-07-20a IM2019 (QTEST_PC) Thesaurus KEYWORDS (with Linnaeus).zip
Note that the (unzipped) size of this QTEST_PC .imths file is 387,123 bytes.

STEP 2.
Moving to my main PHOTO_PC, I took a precautionary backup of its thesaurus before starting the thesaurus merge. Again I saved both the .imths version and a text version of this 'main', 'working' thesaurus.

I've attached those 'backup' files below, inside the ZIP file called:
zip2-2019-07-20b IM2019 (PHOTO_PC) Thesaurus KEYWORDS Backup.zip
Note that the (unzipped) size of this backup 'working' .imths file is a chunky 696,499 bytes.

STEP 3.
Still on my main PHOTO_PC I opened the Thesaurus Manager (see my screen-grab with the filename beginning - (A)2019-07-25-21.54b ...) - and selected the Menu option as shown. Then I browsed for and 'opened' the .imths file that I'd exported from my QTEST_PC.

In the "Import Thesaurus" pop-up I made sure to select "Merge with current thesaurus", as shown in my representative screen-RE-grab with the filename beginning - (B)2019-07-25-22.46 ...

Once the 'merge' had been done - with zero reported errors and with no further feedback from IMatch after I had clicked the OK button - I then took a further backup/export of this now (ostensibly) merged thesaurus.

This 'now merged' .imths file and its text version are attached below, inside the ZIP file called:
zip3-2019-07-20c IM2019 (PHOTO_PC+Merged QTEST_PC) Thes KEYWORDS Bkup.zip
I see in retrospect that the (unzipped) size of this 'apparently-merged' .imths file - at just 431,258 bytes - is well below the size of the backup/export I made of my pre-merged 'working' thesaurus.

STEP 4.
Now believe it or not, I had totally trusted this 'merge' operation on Sat 20 July, and (foolishly in hindsight) I did not stop to examine the 'merged' thesaurus before spending a further day on Wednesday expanding and re-organising this (now fragmented and incomplete) thesaurus, and adding a clutch of new keywords etc. into it.

So the penny dropped today that the 'merge' had not gone as I'd expected, and as is shown in my screen-grab with the filename beginning - (C)2019-07-25-21.57b ... - my 'working' thesaurus had been effectively replaced by the smaller one that I'd tried to merge into it.  Compare the highlighted areas in the two screen-grabs.

STEP 5.
Today I attempted to re-populate the Thesaurus with some of those complex Group entries that I had evidently lost during the merge, by using the additional option in Thesarus Manager to "Import from database" - but frustratingly, this seemed to have zero effect on my 'current' thesaurus.

In other words, none of the set of 8 Groups shown in that first screen-grab - #Lord Mayors of Bristol, or #Pharmacy etc. - were rebuilt or reconstituted into my thesaurus from the metadata that still exists (thank goodness) in the metadata in this particular "PHOTOS" database.

(Some of those Thesaurus Groups - for example #ATTIRE - relate to the metadata in another of my databases, not this 'main' PHOTOS-related database.)

My IMatch LOG file from today (see attachment) will also show me attempting - just once if I recall correctly! - to switch back to an earlier version of the thesaurus before eventually returning to the not-actually-merged version on which I had spent a whole day on Wednesday, adding yet more complexity to what I now know is an incomplete thesaurus.

Anyway, I've stopped digging this hole, and - perhaps unnecessarily but for possible diagnostic purposes - I re-exported "today's" version of my (incomplete) thesaurus ... and I've attached a copy below in the ZIP file called:
zip4-2019-07-25a IM2019 (PHOTO_PC) Thesaurus KEYWORDS Backup.zip
The (unzipped) size of this 'incomplete' .imths file is  482,961 bytes.

Incidentally I have tested and confirmed that each of these .imths files can be re-loaded into Thesaurus Manager without any
(obvious) errors.

My hope is that - once I know what's wrong here - I'll be able to RE-merge todays 'fragmented' thesaurus, with its embedded Linnaeus Groups and all of its updates from Wednesday sessions, into the previously good 'working' thesaurus which I'd backup-up prior to attempting this merge!

Yours hopefully.

ColinIM

Mario, I've waited too long since I submitted the post above so I'm unable to 'Modify' the post.

Would you please delete my zipped Thesaurus files from that post (and perhaps delete this follow-up request too).
Thank you.

I'm usually an absolutist when it comes to (not) 'offering' my personal data to the world, and although there's nothing that's especially precious inside my thesauruses / thesauri, I'm now uncomfortable - on reflection - that I've 'offered' them here so freely for possible perusal by anyone on this public Forum!

Thank you again.

sinus

Quote from: ColinIM on July 26, 2019, 07:04:25 AM
Mario, I've waited too long since I submitted the post above so I'm unable to 'Modify' the post.

Would you please delete my zipped Thesaurus files from that post (and perhaps delete this follow-up request too).
Thank you.

I'm usually an absolutist when it comes to (not) 'offering' my personal data to the world, and although there's nothing that's especially precious inside my thesauruses / thesauri, I'm now uncomfortable - on reflection - that I've 'offered' them here so freely for possible perusal by anyone on this public Forum!

Thank you again.

Hi Colin
I have deleted that zip, but stored here, just in case I made a mistake or Mario should have it for some testing.
I know, that feeling uncomfortable is not nice.  8)

(and, Colin, it was downloaded 0 times)
Best wishes from Switzerland! :-)
Markus

Mario

Please send the original (unmerged thesaurus) and the thesaurus you have merged into this to my support email address. I will look into it but it may take a while. My inbox is full with files and data sent by users for me to analyze or check...
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

sinus

Quote from: Mario on July 26, 2019, 08:57:08 AM
Please send the original (unmerged thesaurus) and the thesaurus you have merged into this to my support email address. I will look into it but it may take a while. My inbox is full with files and data sent by users for me to analyze or check...

I have sent this zip at the support-mail-address from Mario, with hints to Colin and this thread.

Best wishes from Switzerland! :-)
Markus

ColinIM

@Markus / @sinus,
Thank you very much for holding back copies of the ZIPs and for deleting them!
(I've returned here again now (in case clarification was necessary) to confirm that of course I would want Mario to keep copies of the thesaurus files for diagnostic purposes, as you'd anticipated ... but you beat me to it  :) ).

You will guess I'm sure that I had double-wrapped the four thesaurus ZIPs into one ZIP file in order to stay within the Forum's limit of five attachments maximum per posting.

Quote from: Mario on July 26, 2019, 08:57:08 AM
Please send the original (unmerged thesaurus) and the thesaurus you have merged into this to my support email address. I will look into it but it may take a while. My inbox is full with files and data sent by users for me to analyze or check...
@Mario
I had already included those exact thesaurus versions in the ZIPS that I attached to my post, and which Marcus has kindly held onto, but I will re-send them to you again via email ... and I'll include a link in that email to this Forum thread.

For possible cross reference ...

The original unmerged thesaurus-export is in this ZIP file:
zip2-2019-07-20b IM2019 (PHOTO_PC) Thesaurus KEYWORDS Backup.zip

and the smaller thesaurus-export which I attempted to merge into it is inside this ZIP file:
zip1-2019-07-20a IM2019 (QTEST_PC) Thesaurus KEYWORDS (with Linnaeus).zip

Thank you again Marcus and Mario.

ColinIM

Quote from: sinus on July 26, 2019, 09:35:53 AM
Quote from: Mario on July 26, 2019, 08:57:08 AM
Please send the original (unmerged thesaurus) and the thesaurus you have merged into this to my support email address. I will look into it but it may take a while. My inbox is full with files and data sent by users for me to analyze or check...

I have sent this zip at the support-mail-address from Mario, with hints to Colin and this thread.
Ah!!! Marcus you are a STAR!  8) Thank you.

(Our posts overlapped.)

So Mario, I will now not re-send those ZIPs to your support email.

I trust that my zipped IMatch LOG files are still attached to my original post, in case they're useful later.

My appreciation to both of you.
Colin P.

sinus

Quote from: ColinIM on July 26, 2019, 09:59:08 AM
Quote from: sinus on July 26, 2019, 09:35:53 AM
Quote from: Mario on July 26, 2019, 08:57:08 AM
Please send the original (unmerged thesaurus) and the thesaurus you have merged into this to my support email address. I will look into it but it may take a while. My inbox is full with files and data sent by users for me to analyze or check...

I have sent this zip at the support-mail-address from Mario, with hints to Colin and this thread.
Ah!!! Marcus you are a STAR!  8) Thank you.

(Our posts overlapped.)

So Mario, I will now not re-send those ZIPs to your support email.

I trust that my zipped IMatch LOG files are still attached to my original post, in case they're useful later.

My appreciation to both of you.
Colin P.

Fine, Colin, you're welcome.
That is ok, because I sent the original zip, what I deleted, to Mario, and I am sure, when he has the time, he will look at it.
So you are correct, it is not necessary of you to send it to Mario, because I did it already.  :D

You kind words are very nice, thanks for them!
To be honest, I deleted the zip quickly, because I have seen, that it was not downloaded from someone and because I "know" you a long time and feel, that you are a more cautious and sensitive man (of course only my feeling, what can be wrong), I decided to delete it immediately, that you could relaxe.

And since I "know" also Mario a long time  8)  I was quite sure, that he will ask for the file, if he wants it and .... he did, because he is great programmer with really outstanding support.

Have a good weekend!  :)
Best wishes from Switzerland! :-)
Markus

Jingo

Quote from: sinus on July 26, 2019, 01:45:20 PM
Quote from: ColinIM on July 26, 2019, 09:59:08 AM

Ah!!! Marcus you are a STAR!  8) Thank you.

My appreciation to both of you.
Colin P.


To be honest, I deleted the zip quickly, because I have seen, that it was not downloaded from someone and because I "know" you a long time and feel, that you are a more cautious and sensitive man (of course only my feeling, what can be wrong), I decided to delete it immediately, that you could relaxe.

And since I "know" also Mario a long time  8)  I was quite sure, that he will ask for the file, if he wants it and .... he did, because he is great programmer with really outstanding support.

Have a good weekend!  :)


What an AWESOME community we have here... superb moderators looking out for our members!  I am so very happy to be a part of the community!!

Mario

This thread now gets a bit diluted.

From your initial post I understand that you had problems when merging a thesaurus file B into a thesaurus build from file A. Is this correct?
So I need file A to fill a thesaurus and file B to import it and see if I can see the problem here as well. May take a while, much stuff in my inbox to check.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ColinIM

Quote from: Mario on July 26, 2019, 03:39:09 PM
From your initial post I understand that you had problems when merging a thesaurus file B into a thesaurus build from file A. Is this correct?

Yes.

Quote from: Mario on July 26, 2019, 03:39:09 PM
(....) So I need file A to fill a thesaurus and file B to import it and see if I can see the problem here as well.

The single ZIP file that Marcus has forwarded to you, has the filename:
zip1 to zip4 - 2019-07-20 and 2019-07-25 - FOUR pairs of thesaurus files.zip
and it contains four smaller ZIP files.

I trust you'll be able first to unzip that single ZIP file,
and then secondly to identify the following two smaller files ...

The thesaurus you have labelled 'A' - The original unmerged thesaurus:
zip2-2019-07-20b IM2019 (PHOTO_PC) Thesaurus KEYWORDS Backup.zip

and the thesaurus you have labelled 'B' - the smaller thesaurus which I attempted to merge into it:
zip1-2019-07-20a IM2019 (QTEST_PC) Thesaurus KEYWORDS (with Linnaeus).zip

You could then delete the remaining two smaller ZIP files from that set of four:
zip3-2019-07-20c IM2019 (PHOTO_PC+Merged QTEST_PC) Thes KEYWORDS Bkup.zip
zip4-2019-07-25a IM2019 (PHOTO_PC) Thesaurus KEYWORDS Backup.zip
or perhaps hold onto them if necessary to cross-refer to their relevance in my original post.


Quote from: Mario on July 26, 2019, 03:39:09 PM
(....) May take a while, much stuff in my inbox to check.

Thankyou Mario, and I totally understand about the likely timescale.

ColinIM

My original post above was a result of me attempting to use Thesaurus Manager to merge two thesaurus versions.

I had built some Keyword Group structures in my smaller thesaurus on my 'test' computer, and I wanted to merge those Keyword Group structures into my 'main' (larger) thesaurus.

IMatch's Thesaurus Manager was failing absolutely - and today is still failing - to do any merging of the two thesaurus files. Instead, the 'imported' thesaurus completely overwrites the 'current' thesaurus every time, in spite of me ticking the 'Merge with current thesaurus' radio button. (See my attached image above, labelled as: (B)2019-07-25-22.46 (ColinIM) Thesaurus Merge option )

I'm adding this note to report the following:

1.  I have now made multiple further Thesaurus Manager tests with pairs of smaller, abbreviated thesaurus files, trying to achieve the promised 'Merge with current thesaurus' ... but in every test, the 'current' thesaurus was overwritten and replaced by the 'new' thesaurus file to which I had browsed using Thesaurus Manager.

2.  For the sake of brevity I will not describe my later tests. My test results were effectively identical to those in my original post.

3a.  I have now been able to achieve the thesaurus 'merge' that I needed by using a third-party XML tool (details below) - with what turned out to be a relatively simple XML Cut & Paste operation.

3b.  However, besides importing my merged thesaurus .IMTHS file into IMatch, I had no way of validating the XML with reference to IMatch's pt_thesaurus XML schema.  But after a lot of scrutiny and scrolling through my newly merged thesaurus, everything seems OK.

4.  I will now raise a BUG report on the Thesaurus Manager's 'merge' option - cross referenced to this already long post - and I look forward if possible and if appropriate to being persuaded that this is not a Bug .... although I'm convinced that it is a Bug.

5.  I will also raise a Feature Request, asking for the inclusion in Thesaurus Manager of some method of:

(a) confirming the validity, integrity or 'correctness' etc. of the  thesaurus that is currently loaded in IMatch.

(b) confirming the validity etc. of any external thesaurus .IMTHS  file that the User might wish to load into IMatch - or to merge into IMatch's already 'live' thesaurus.

Here are the details of the old-but-still-useful XML Editor that I used to merge my thesaurus files, although with the important caveat that without access to the IMatch "pt_thesaurus" XML schema, this Editor tool was only able to partially Validate the XML in my merged thesaurus file.

XML Copy Editor version 1.2.1.3
http://xml-copy-editor.sourceforge.net/
License: GNU / GPL
It was last updated in 2014.

ColinIM

Two quick follow-up points from  my original post above.

Referring to my "SYMPTOM 2" ...

Quote from: ColinIM on July 26, 2019, 02:48:15 AM
(....)
MY SYMPTOM 2:
( ... snip ...)
( ....) to re-populate the 'current' Thesaurus with some of the existing metadata in my database ( ... )
did not re-create any of the Keyword Group structure that had existed prior to my failed thesaurus merge (....)
( ... snip ...)

Follow-up Point 1:
Since I successfully loaded a thesaurus that I had merged outside of IMatch, this general symptom has now gone.

I have now successfully used Thesaurus Manager to ingest into my thesaurus, the few Keywords which had been written into some images  while an earlier version of my thesaurus was 'active'.

Follow-up Point 2:
Regarding the re-construction (from existing Keywords) of the Keyword Group structures (such as the "#Pharmacy" Group seen in my screen-captures) - I have since read somewhere among the extensive IMatch hints, tips and Help pages that - depending upon our Metadata Preference settings - these Keyword Group structures might not be re-created during a re-scan and re-ingestion of Keywords from our image files.  So this is also not a Bug!

I'm sorry that I cannot relocate from where I learnt this detail. In spite of a lot of searching prior to adding this note.

Finally ... a bonus TL;DR detail about my Keyword Group names:

I have prefixed my Keyword Group names with the '#' character in order to bring them always to the 'top' of the Thesaurus structure, and to the top of my Keyword Panel.  These Keyword Groups would otherwise be 'lost' in standard alphabetic order among my thousands of other alphabetically-sorted Keywords.

Finally finally, referring to the second of my Thesaurus Manager screen-grabs above, my use of those square brackets '[' and ']' around the Keyword Groups that I've named "LINNAEUS_REF" and "TAXONOMY" was a temporary mistake!  I now also prefix those two Group names with the '#' character.

(I had forgotten that when a Keyword Group appears in our Keywords Panels, the Keywords Panel already wraps the Group name in the same square brackets, so of course I was seeing double square brackets on those two Group names!)

Thank you for listening!

Mario

After looking at the code and docs, this appears to be a misunderstanding of what the "merge" option does.

The merge option operates on a per-tag basis.
If you import a thesaurus, it normally empties the existing thesaurus (all elements for all tags) and replaces them with the tag data contained in the imported thesaurus.

If you enable the "merge option", tag data that only exists in the existing thesaurus is retained. Again, on a per tag basis.

Example:

If your existing thesaurus has data for keywords, headlines and title and the thesaurus you import from has only data for keywords, this is what happens:

1. No merge.
The thesaurus only has data for keywords after the import.

2. Merge
The thesaurus has data for keywords (a copy of the keywords from the imported thesaurus) and also still has the data for headline and title.

IMatch does not somehow attempt to merge data imported with data for the same tag that already exists in the thesaurus.
I think this is what you expect, that the import merges the keywords you import with the existing keywords.

But that's not how this works. This was never different, the code is unchanged since IMatch 5.
Merging thesaurus is just not something people need to do often. Or at all.

You can achieve what you want (to some extent) using the following workflow:

1. You export the thesaurus you want to import elsewhere to text (with the corresponding option in the Thesaurus Manager).
Note that this has to be done for each tag individually.

2. You import the text file into the target thesaurus using the merge option. If a plain text thesaurus is imported, the data is merged with the existing data, merging your keywords.

Note that some of the extended properties available in the thesaurus for elements and maybe synonyms may not be imported correctly. This depends on your thesaurus structure.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ColinIM

Thank you Mario.

Quote from: Mario on August 01, 2019, 02:21:47 PM
After looking at the code and docs, this appears to be a misunderstanding of what the "merge" option does.
The merge option operates on a per-tag basis.

Yes. A misunderstanding. This behaviour is now clearer to me.

With your use of the word misunderstanding can I respectfully hint that - with the existing documentation on Thesaurus Manager - this is one of those really, really rare instances of where IMatch's exemplary documentation might be improved!?

Quote from: Mario on August 01, 2019, 02:21:47 PM
(....)
IMatch does not somehow attempt to merge data imported with data for the same tag that already exists in the thesaurus.
I think this is what you expect, that the import merges the keywords you import with the existing keywords.

Yes. That was exactly what I thought would happen.  Would you consider changing that Radio-button label on Thesaurus Manager's dialog-pane to say "Merge tags with current thesaurus" instead of just "Merge with current thesaurus"?

Before I raised this 'merge' question I had spent a lot of time searching and reading through the Help files on Thesaurus Manager, and I then ran multiple tests trying to figure out what was happening with my 'merge' attempts.  But (forgive me stating an obvious point here ...)  my misunderstanding was not corrected by anything that I read in the related mouse-over hints or in the Help pages.

Quote from: Mario on August 01, 2019, 02:21:47 PM
(....) This was never different, the code is unchanged since IMatch 5.(....)

This is very true of course, and this 'stability' within IMatch is one of its real strengths, but I'll always be ready to 'thump the machine' once in a while, if I think it's not doing what I think it should be doing  ;D

Quote from: Mario on August 01, 2019, 02:21:47 PM
Note that some of the extended properties available in the thesaurus for elements and maybe synonyms may not be imported correctly. This depends on your thesaurus structure.

Yes, I discovered this quite early in my experiments. I had initially tried to use text-only thesaurus exports and 'merges' but I saw (for example) that the many notes I've added to the Description fields on my Keywords were lost during these text-based transfers .... so I worked purely with .imths / XML files after that.

Thank you again Mario, and please consider adding some clarification to your already superb Help scheme, hopefully to reduce 'merge misunderstanding' in the future!

Mario

Feel free to use the "feedback" link at the bottom of each help topic to report wrong or missing information. I collect these feedback emails and work through them every couple of weeks when I push help updates.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook