A check to Validate the XML in thesaurus IMTHS files

Started by ColinIM, July 30, 2019, 10:29:40 PM

Previous topic - Next topic

ColinIM

For some background argument, please cross refer to this post:
A Thesaurus Merge resulted in a thesaurus overwrite
https://www.photools.com/community/index.php?topic=9206.0

At present the only way to validate the structure and validity of the XML in an IMatch thesaurus / .imths file is to load the file into IMatch.

The Thesaurus Manager allows us to export and import and merge our own versions of thesaurus / .imths files, but we have no way to run a health check or a validity test either on the currently loaded thesaurus or on any of (what could be many) versions of 500kB and larger external .imths thesaurus files.

I accept that in theory, IMatch is the only software that should operate upon our currently loaded thesaurus or on any external thesaurus file, but I know from personal experience that thesaurus errors can be reported, even though no other software has touched my thesaurus files. IMatch already does some checks on each thesaurus .imths file that we load into IMatch, but in the rare event that a thesaurus file has become corrupted or faulty, we will only see a brief error message rejecting the file.

The 'reject' error message tells us nothing about which aspect of the thesaurus file caused it to be rejected, and with a thesaurus that can contain many thousands of Keyword entries plus multiple deep Keyword Group structures (and many of mine include non-English terms and phrases), this can be hugely frustrating - and impossible to diagnose.

My request is for the inclusion in Thesaurus Manager of an informative Thesaurus confidence check or similar.

In other words, please provide:

(a) some method of confirming the validity, integrity and 'correctness' etc. of the XML in the thesaurus that is currently loaded in IMatch.

(b) some method of confirming the validity etc. of any external thesaurus .IMTHS file that the User might wish to load into IMatch - or might wish to merge into IMatch's already 'live' thesaurus.


Mario

Since the introduction of IMatch 5 many years ago, no error was ever reported regarding the thesaurus import/export.
As far as I can tell, your bug report from yesterday is the only problem ever reported. I will investigate this of course and fix the bug if there is one.

I think that there are not many users who ever export/import thesauri or merge thesauri.
Implementing a "validation" of sorts could take a long time without any benefit for the majority of the user base.
If there is a bug in the thesaurus that prevents merge for all/some thesauri files, it has gone unnoticed for many years...

IMatch uses an XML-based text format for the thesauri to allow for easy import into other applications (or producing thesauri for IMatch in other applications).
This allows "validation" by just loading the file into a normal text editor.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ColinIM

Quote from: Mario on July 31, 2019, 12:35:26 AM
Since the introduction of IMatch 5 many years ago, no error was ever reported regarding the thesaurus import/export.
(....)

Yes, this is a very significant point. Without a doubt. But still, this wouldn't discourage me from being (perhaps) the very first person to report my experience of such a problem  :)

Quote from: Mario on July 31, 2019, 12:35:26 AM
(....) I will investigate this of course and fix the bug if there is one.

Thank you.

Quote from: Mario on July 31, 2019, 12:35:26 AM
(....) IMatch uses an XML-based text format for the thesauri to allow for easy import into other applications (....)
This allows "validation" by just loading the file into a normal text editor.

Sorry, I respectfully disagree on how 'easy' it might be to validate a long and complex XML file.

Assuming we can get past the hurdle of loading (in my case at least) a 700 kB XML/.imths file "into a normal text editor", there is surely a lot more to 'validating' the XML in an XML file than scrolling through it page by page (for example) in a text editor?

(I hesitate to overstate my points here Mario and I sincerely don't want to be argumentative, but I do want to add two more points to explain why I felt this Feature Request was at least worth 'airing'.)

1.  My initial attempts to use Notepad to 'merge' the carefully selected segments of my two imths files, failed, in spite of my careful scrutiny in Notepad of the 700+ kB merged XML file (taking care to include the XML header etc.).  IMatch refused to load the newly merged file, with an ERROR popup saying my newly merged file ".... could not be imported. The format of the file is wrong".  I needed to use a 'proper' XML Editor afterwards to discover the glitch that I had missed while trying to use 'just' Notepad.

2.  In both of the XML Editors that I eventually used there were options to 'validate' the XML in these thesaurus files, but neither tool was able to do a complete Validation of the XML unless I had first supplied the appropriate 'DTD Schema' (or a W3C Schema or an "associated XSD file" etc.), and I reasoned that it would be a step too far to ask you to supply me / us with a copy of the ptthes.xsd file which is invoked on line 4 of each .imths file ...

<pt_thesaurus xmlns:ptthes="http://schemas.photools.com/ptthes.xsd" vendor="photools.com" version="1.0.0">

... and I wasn't even sure if that ptthes.xsd file would in fact be sufficient to fully validate a thesaurus / .imths file .... hence this Feature Request.

I invite you to close this Feature Request Mario, but I did think it was worth giving it a try.
Respectfully yours,
Colin P.

Mario

I don't maintain schemas for these files. Way to complex to create and maintain.
The format is unchanged since IMatch 5 and since then has just worked. If there is no a problem that causes your thesaurus to merge (or others) I will fix it. Not worth spending any time on doing XML schemas for this.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook