Progress indicators and stop buttons when iMatch is completing tasks

Started by Mike, February 03, 2021, 12:59:27 AM

Previous topic - Next topic

Mike

If possible, I suggest providing more detailed feedback when iMatch has something to do.

In some situations, a kind of progress bar appears which is very good. In others you can only see the small icons in the right corner indicating activity, but you have no idea how much % still has to be done.

One or several progress indicators (e.g. within the lower edge of the window) and the option to safely stop the current action would be very good.

For many hours (now eight) the computer has been working on deleting a large number of keywords from a group of 600 images, and from the start I have no idea whether the task will be done soon or not until tomorrow morning. (I don't even know if this is the only task that keeps the computer busy, or if there is a lot more to do.) I'm not sure this long time is normal for what is happening. In order not to destroy any work that has already been done, I have not yet violently interrupted the process. I'm still waiting...

Mario

Deleting keywords from a merely 600 files takes only a couple of seconds here.
Just tried with 1,000 files, adding and then removing a keyword. 4 seconds for each operation.

Are you sure your database is on your fastest disk and that your virus checker is not bringing IMatch down to a crawl...?

Mike

I have no idea what it was. I fell asleep yesterday and unfortunately the computer was still running (16% CPU) until this morning when I stopped iMatch hard.

a) There were around 2300 keywords that had to be removed from 600 files.

b) The data-driven category pannel showed all keywords at the time when I gave the delete command.
This is what I did:
- I selected several keywords in the @ keyword tree, which resulted in 600 files being displayed.
- I selected those 600 files and then removed all of their keywords that were showing in the keywords panel.
(Observation: data-driven categories like the @Keywords generally lead to a slowdown.)

c) My database is on a fast SSD, but the files themselves are on both the SSD and a HDD. (at the moment there are about 370000 Files there)
d) I didn't have the feeling that Kaspersky Internet Security interfered. But I can't confirm it. Is there anything specific by which I could identify such interference?

Despite a hard stop, the DB seems to be OK. What was not finished I did again. I will continue to monitor the potential problem.
I still have to remove 42700 keywords because I want to radically change the keyword system. I hope the problem doesn't appear again. I will try to remove fewer keywords at once, even if that lengthens the process itself.


Mario

Remember to enable debug logging and post a ZIPped version of your log file. See log file
The log file will show us what IMatch is doing and the time each operation takes.

Deleting keywords is a in-database operation and thus very, very fast.

Mike

I repeated the procedure with less files.

a) I selected a larger number of keywords in the expanded data-driven @ keywords category on the left.
b) Then 60 files were displayed in the file window.
c) I selected them all so that all of their keywords are displayed in the keyword panel on the right. There were quite a few. I removed them all and then pressed Ctrl + S to save them to the database
d) iMatch work has started and has prevented normal access for about 10 minutes. Then everything was ready.

It was only 60 files and lots of keywords and it took 10 minutes. No wonder that yesterday with 600 files and a lot more keywords the effort increased exponentially.

After completing the task, I closed iMatch ans saved the LOG. Can you see something suspicious on the attached LOG file? For example, why could it have taken 10 minutes?

Mario

Your database has about 370,000 files and nearly 50,000 categories.

The first thing I see is that you have deleted one or more categories in the category view.
This operation took 110 seconds, almost two minutes.

Where do you delete "keywords"? In the Keywords Panel? In the Category View?
Which kind of category did you delete? A @Keywords category?

Mike

The database is in massive work. Different categorical systems from different times meet in it at the moment.
They are to be gradually replaced by a completely new classification system that I am currently developing.

QuoteWhere do you delete "keywords"?

Depending on what I wanted to achieve, in the last few days I either deleted branches of the @Keywords category or deleted keywords in the keyword panel.
But there were always large groups at once. Large groups of files and large keyword swarms at once.

The long waiting times arose mainly when deleting in the keyword panel, but maybe that implied larger groups of keywords and several files.

I will test a few more things as e.g.:

a) Different procedures for deleting, and also whether it matters which windows are open during the deletion.
E.g. the data-driven category panel seems to have to be updated often, which may be a problem with many deletion processes.

b) Will see if PDF files were included and if they take relevant more time to write metadata (in Adobe Bridge that was the case back then)

c) Somehow I have to find out whether Kaspersky is slowing down my work ...

Jingo

Quote from: Mario on February 03, 2021, 04:38:19 PM
Your database has about 370,000 files and nearly 50,000 categories.

50,000 categories!?!?!?  How (and why) are these used...  I suppose perhaps a scientific database might have a ton of data categories... but wow.. must be a ton of work to manage these and catalog new items!

Mike

The 50,000 categories are just an unfortunate intermediate stage in a phase of massive restructuring. They are a summary of the outputs from multiple systems. Some were automatically converted into keywords from analysis codes and other research units, others were created in more complicated ways ...

At the moment I'm working on adapting all systems to each other and creating a common, much leaner new category system. It's not easy because I have to manage several worlds under one roof.

A particular challenge is that the categories not only have to work in iMatch, but they have to work similarly reliably within other systems. I would like to have to "translate" and "convert" as little as possible later.

I think there is still a lot to be done  ;)

Mario

Quote from: Mike on February 03, 2021, 10:12:11 PM
Depending on what I wanted to achieve, in the last few days I either deleted branches of the @Keywords category or deleted keywords in the keyword panel.
But there were always large groups at once. Large groups of files and large keyword swarms at once.

I did not see that in the log. Only you deleting one or multiple categories.
If these were @Keyword categories, IMatch may have to recalculate the entire @Keyword category hierarchy afterwards (because it is visible).

Quote
c) Somehow I have to find out whether Kaspersky is slowing down my work ...

Make sure the folder (!) containing the database is marked as an exception (to prevent the on-access scanner from scanning the entire database after each write, which IMatch does thousands of times a second).
See IMPORTANT: Virus Checkers

I've made a check. On my 800,000 files test database with 28,000 categories (most of them @Keywords), deleting a child of @Keywords takes about 1 second. Not 120 seconds as in your log file. Even at twice or three times the number of categories, I doubt that this will go up to more than 3 or 4 seconds.

I've also tried the same with a category containing 10,000 files. This time it took 2.5 seconds, because the metadata of 10,000 files has to be updated. Still on a 800,000 files database.

Mike

Thanks for the tips and tests!

@Keywords vs Keywords Panel

Yesterday I saw that deleting groups of children in @Keywords was noticeably faster than deleting groups of Keywords in the Keywords panel. (When deleting in the Keywords Panel, I always clicked the green confirmation button too).
When deleting in the Keywords Panel, I had the feeling of starting a higher number of processes than when deleting in @Keywords. Could it be that one procedure (Keyword Panel) tries to solve everything in a row and the other (@Keywords) postpones some work until later? Anyway, that might explain why @Keywords looks like it's done faster (even if it's maybe not true). At least it allows me to access the interface faster.

Problematic Files

I also discovered some files whose keywords could not be removed directly. Such files may have slowed down some processes.
When I try to delete keywords from such files at first it looks like it is working. They will then disappear from the file window and will reappear later when iMatch determines that it didn't work after all. I can also see that the corresponding @Keywords children are made blank and later the files are back in the categories.

I wonder if such "loops" could have caused long delays because they imply a lot of additional processes - especially if they contain a large number of keywords.
When I have the time, I will try to specifically test whether the processing time is relevantly shorter / longer depending on whether I am using normal or problematic files.

Antivirus
QuoteMake sure the folder (!) containing the database is marked as an exception

I added the database folder as an exception to Kaspersky. I think iMatch works faster, but I'm not exactly sure because some conditions have changed. I will have to test some things specifically.
Since I didn't have time, I completely stopped Kaspersky control for the DB folder. A screenshot is attached. Should I leave it like that in your opinion, or do you think that some of the listed functions can be switched on again?

Mike

I forgot to ask: how do you recognize the place that shows the duration of the deletion process. I'm asking because I couldn't find the named 120s.
Maybe then I can compare several such logs with each other before I post again.
Thank you!

jch2103

Quote from: Mike on February 04, 2021, 10:35:40 PM
Problematic Files

I also discovered some files whose keywords could not be removed directly. Such files may have slowed down some processes.
When I try to delete keywords from such files at first it looks like it is working. They will then disappear from the file window and will reappear later when iMatch determines that it didn't work after all. I can also see that the corresponding @Keywords children are made blank and later the files are back in the categories.

I wonder if such "loops" could have caused long delays because they imply a lot of additional processes - especially if they contain a large number of keywords.
When I have the time, I will try to specifically test whether the processing time is relevantly shorter / longer depending on whether I am using normal or problematic files.

It sounds like you may have legacy IPTC/XMP conflicts, which can cause the problems you mention. You may want to run the Metadata Analyst app from the App Manager on some of those files to confirm if that's an issue. It's possible to strip the legacy IPTC data using the ExifTool Command Processor. I one had a bunch of files with this issue; cleaning them up fixed the issues I was having.
John

Mike

QuoteIt sounds like you may have legacy IPTC/XMP conflicts

You might be right - at least for some of them -, the metadata analyst suggests so.

I looked at the Exif Tool Command Processor. But I shouldn't have touched it at this time of day because I'm far too tired after only sleeping an hour yesterday.
I accidentally removed the first two presets (Delete all metadata and all keywords... or something like that).
Is it possible to return to the tool's default settings somewhere?

Mike

All along I meant Categories Tab and not Categories Panel.
I realized that I always wrongly said panel instead of tab in my descriptions, which are two different things in iMatch. Sorry!

Mario

800,000 files database.
Looking at a @Keyword category with 10,000 files in the Category View.
Selecting all 10,000 files and pressing <U> to un-assign them from the category. This also removes the corresponding keywords from the files.
Time: 3.8 seconds.
You can search the log file for CIMTaskGroup::UnAssign to find the corresponding entry (the one with the < and the duration in []), if you want to repeat this test.

Doing the same while in the Medias & Folders View, with the Keyword Panel takes 20 seconds. This is due to additional UI update work that needs to be done (the Category Panel is open in my case).
IMatch must also check for version chain propagation in my case, because any of the files I remove keywords from may be a master for a version.
Still, deleting a keyword from 10,000 files in 20 seconds, with all the secondary data depending on this (versions, @Keywords, collections, file history updates, ...) is pretty good.

You can find the relevant entry in the log file afterwards by searching for CIMControlContainerWnd::WriteBack, again the closing entry (with the < and the duration in []).

If this takes much longer on your system, your computer is slower than mine (I use a performant PC with fast m.2 SSD storage and 12 cores), or something on your system is severely the file system and I/O performance of IMatch. Or maybe you have too much of anything.
300,000 files. 50,000 categories, each category possible being invalidated when you delete a keyword etc. There are limits, even for IMatch.

I will move this thread to General Discussions, because it is no feature request.

Mike

Thanks for the hints. I'll see how I can optimize the situation.

jch2103

Quote from: Mike on February 05, 2021, 12:14:52 AM
QuoteIt sounds like you may have legacy IPTC/XMP conflicts

You might be right - at least for some of them -, the metadata analyst suggests so.

I looked at the Exif Tool Command Processor. But I shouldn't have touched it at this time of day because I'm far too tired after only sleeping an hour yesterday.
I accidentally removed the first two presets (Delete all metadata and all keywords... or something like that).
Is it possible to return to the tool's default settings somewhere?

I don't remember how to reload the ECP presets... [@Mario: ?]

But here the code for removing legacy IPTC data:

Delete legacy IPTC (IIM) metadata

# im-warn
-overwrite_original_in_place
-iptc:all=
-charset
filename=UTF8
{Files}
John

Mario

1.  There is no reload. The presets are filled initially from special resource file entries on first open of the ExifTool Command Processor.
I have not anticipated that users accidentally delete multiple presets.

But it's just text. Let us know which presets you have deleted and we copy show you the original content.

2.  Discrepancies between legacy IPTC and XMP keywords (often caused by half-assed software which gives a shit for metadata standards) have no impact on how fast IMatch can remove keywords in the database.
They can only cause secondary write-backs or maybe even unreasolvable keyword mappings which need to be fixed manually by the user.

Mike

Sorry, I didn't see the answers!

Thank you both!

Accidentally deleted presets: In the meantime I managed to sleep longer than last time, so I was fit enough to understand the tool and replace what I deleted  ;)

QuoteDiscrepancies between legacy IPTC and XMP keywords.

With the help of the ECP I removed the errors from the problem files.
But now I'm thinking about tracking down all such files and removing legacy IPTC entries. Perhaps one or many data-driven categories could be used as a rat catcher.
Has anyone ever specifically done something like this: looking for files which contain legacy IPTC metadata? Is there a certain TAG particularly suitable for this? Or is there a better way to find them all? E.g. "search everywhere in metadata" in the File Window using special terms?

Mario

Removing legacy IPTC is a good thing, in principle.
If you are sure that none of your clients, friends, colleagues, customers etc. depends on it. Although IIM3 IPTC has been retired 15 years ago, there are still many applications and services which can only handle this format, but not XMP.

To find all files with legacy IPTC, just use the Metadata Value with the IPTC application record tag.
This tag should be written by all applications embedding a IIM3 IPTC record. Needless to say that not all applications did. And some which did, wrote perfect nonsense or fantasy values in the tag. Metadata mess.
To find all files in the current scope (could be the entire database!) use 



As you can see, even in my quick test, the files in the scope had so many different values for this tag...4 is the only official value!
If you invert this filter, you get all files without a value for this tag (files which most likely have no legacy IPTC).

Mike

That's great, thank you!

I was already expecting some chaos when I looked for it, knowing the existence of multiple variants.
As beautiful as diversity is, this is hardly the right place for it. They seem to have mutated as diligently as viruses.

I will not delete anything without first looking for possible and perhaps even unobvious implications. Good point!

jch2103

Or just use the IMatch Dashboard (View/App Views/IMatch Dashboard). Look for Quality/Standard Views/Files with legacy IPTC data. No extra work!

John

Mario

Quote from: jch2103 on February 06, 2021, 08:18:16 PM
Or just use the IMatch Dashboard (View/App Views/IMatch Dashboard). Look for Quality/Standard Views/Files with legacy IPTC data. No extra work!

Sush! You're talking about features not yet available for the normal user base!