Why would deleting unused (sample) thesaurus elements be slooooow?

Started by MrPete, December 21, 2024, 08:21:18 PM

Previous topic - Next topic

MrPete

My sweetie is coming up to speed on iMatch.

Her first serious attempt has been quite frustrating, in what I would have thought is a simple task:

* Open Thesaurus Manager (for the first time ever)
* See a lot of elements she doesn't need
* Delete them and save (several top-level items), wanting to do her own structure.

What we observed: this took almost 15 minutes!
* In the log, it shows many ~10 second keyword deletes:
QuotePTThesaurusDatabaseUpdater::DeleteKeyword  'V:\develop\IMatch5\src\IMEngine\PTThesaurusDatabaseUpdater.cpp(414)'
12.21 10:56:50+ 9375 [6A54] 05  M>  <  0 [9375ms #sl]

Context
* Reasonably fast system (i7-8700 (6 core, 12 thread), 48GB RAM, database on 3GB/sec M.2
* Before deleting the elements, we checked several and verified none were in use (which makes sense: she's done nothing to use any existing Thesaurus elements to date.)
* She DOES have some existing keywords in imported photos
* The photo database IS reasonably big: 320k files
* There are a huge number of files pending metadata writeback (310k -- basically the entire imported collection)

I assume there must be something else about open panels or ?? that's causing this to be slow.

Potentially Related

* After waiting through the above, she added a hierarchical keyword to her 10k+ insect photos and saved
* At this point, iMatch is not doing anything at all, and nothing being logged.

YET: depending on which folder is selected in Media & Folders, the keyword pane shows "Updating..." in the center!

* I've tried opening sub-folders and seen that there is NO "Updating" for any of those, yet the parent folder still says "updating". The system is completely idle - yet still "Updating"

Anything I should look for in the log? I've turned on debug logging if that might help.

THANKS!


Mario

Have you enabled the option to apply thesaurus changes to the database?
In that case, IMatch would have to check each deleted keyword in each file in the database to see if it has to be removed.

Because, I've just checked and removing a hierarchy of about 3,000 keywords from the thesaurus with that option off takes maybe one second? It's an all in-memory operation, super-fast. Saving a huge thesaurus takes maybe 3 seconds.
Unless you tell it to apply changes to the database, which takes a lot more work.

See Updating Keywords in the Database from Thesaurus Changes


Quote* After waiting through the above, she added a hierarchical keyword to her 10k+ insect photos and saved
Added the keyword where? Selecting 10K files and then adding the keyword in the Keywords Panel?
Adding keywords in the thesaurus does not impact your files and takes 0.1 seconds.

Quote* There are a huge number of files pending metadata writeback (310k -- basically the entire imported collection)
Normal. Point the mouse cursor at the pen to see the first 10 tags to write.

See Metadata for Beginners for background info and why the rich and complete metadata record IMatch produces when importing files requires a write-back, almost always.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

MrPete

Wait a sec.

SO:
* If I look at a thesaurus keyword and search for any photos w/ that keyword, it "instantaneously" knows there are none.
* Yet, if I delete the keyword (or 100 of them), it can't do the same thing to verify whether / which files need updating?

Something seems fishy about that.

What about the ongoing "Updating..." indicator, that seems unrelated to any current or pending activity?


Mario

Do you have the option to apply changes done to the thesaurus to the database enabled?

Quotey photos w/ that keyword, it "instantaneously" knows there are none.
Searching the entire database for a keyword should be fast ;)

I've just made some tests with a 920,000 files database (database on SSD).

Deleting a keyword (3 levels deep) from the thesaurus without applying changes to the database is instant.
Closing the Thesaurus dialog (which saves the thesaurus) takes maybe a second (big thesaurus based on the default IMatch thesaurus + maybe 2,000 extra keywords).

Deleting a 3-level deep keyword from the thesaurus with enabled "apply to database" option and then selecting the "Apply to entire database" in the prompt takes 44 seconds. Which is OK for a database with almost one million files. It's not a super-frequent operation, not many users have databases with one million managed assets etc.

I've also tested with a 100,000 files database and applying the changes to the entire database takes 2.5 seconds.

For your database, I would expect maybe 15 or 20 seconds?

Do this:

- Switch IMatch to debug logging: Help menu > Support
- Repeat the deletion of a keyword from the thesaurus
- Afterwards, Help menu > Support > Copy Log file...
- ZIP the created file and attach.

I can then see what takes how long and maybe provide advice..

Always make sure that your virus checker has an exclusion for the folder (!) containing the database. If an on-access virus checker gets bonkers and scans the IMatch database on every write access, performance will be ruined.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

MrPete

QuoteDeleting a keyword (3 levels deep) from the thesaurus without applying changes to the database is instant.
Closing the Thesaurus dialog (which saves the thesaurus) takes maybe a second (big thesaurus based on the default IMatch thesaurus + maybe 2,000 extra keywords).

Deleting a 3-level deep keyword from the thesaurus with enabled "apply to database" option and then selecting the "Apply to entire database" in the prompt takes 44 seconds. Which is OK for a database with almost one million files. It's not a super-frequent operation, not many users have databases with one million managed assets etc.

So... wouldn't it be much faster to modify this a bit? To delete a keyword:
  • Test if the keyword is in the database (very quick)
  • If in database: apply deletion to database. (In fact, since we know which files have the keyword, we don't need to apply to the entire database.)
  • If NOT in database: delete w/o touching DB.

The extra time for the test will be negligible. And it will save 44 seconds per keyword in a million file database. 10 seconds per keyword in our 300k file database.

:)

Mario

You also have to consider features like keyword links, keyword group levels, keyword exclusion levels and suchlike.Aka, the keyword assigned to a file (searchable) is not necessarily the keyword as it appears in the thesaurus.
This process may not be as easy as you think it is. I remember having a real hard time to deal with all the special and edge cases when implementing this a year or two ago.

If you only want to delete a keyword and your use-case is simple:

- Select the corresponding @Keywords category in the Category View
- Select all files with <Ctrl>+<A>
- Press <U> to un-assign them from the keyword.

You can also select multiple categories (same level) and to this to remove multiple keywords at once.

Or, just select files in a File Window and then <Ctrl>+click the keyword in the Keywords Panel you want to remove from all selected files.

The ability in the thesaurus is more aimed at complex scenarios, moving branches, changing links or level attributes etc.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook