Refresh relations

Started by hluxem, November 30, 2018, 05:41:59 PM

Previous topic - Next topic

hluxem

Hello Mario,
In the past couple of days a worked quite a bit with relations and usually when I refresh the relation it happened really fast. Once in a while, it started to act weird and the time went up to hours even when only a couple of files were selected. I then canceled the dialog after a while and either run it again or just selected one file and after that the refresh happened fast again.
Today I finally managed to switch to debugg logging and catch an instance where it happened. See attached log file for the last 3 things done:


  • selected 28 files, update relation, dialog said updating relation for 6097 files, only 28 were selected. Dialog showed hours to finish and at some point I canceled the refresh
  • Selected one of the previous file, relation updated instantly
  • Selected the remaining files, relation updated instantly


Not sure if there is anything showing in the log file, I can work around it, but thought there maybe something interesting in the log file.

Heiner

Mario

#1
No log attached.
Manually refreshing relations is usually only needed when you have changed your file relation rules and you need to make IMatch rescan all selected files to apply the new relations.
The behavior also depends on which relations you use, where you make IMatch search for versions etc. Include screen shots of your file relation setup.

The last user reporting slow performance related to relations had two rules which forced IMatch to search the entire database for each file, causing millions of file operations during each folder update...

hluxem

Sorry, I hate when that happens.
Log file and screen shot attached.
This is not a general performance issue with relations, I say 90 % of the time they come back really fast. Just sometimes it takes forever, the dialog always shows more files than selected and the displayed time goes up in the hour range.

I do use the command refresh relation when I initially want to create the master version relation for a folder.


Heiner

Mario

#3
QuoteI do use the command refresh relation when I initially want to create the master version relation for a folder.

This is automatic. IMatch applies relations automatically when folders are added or files are updated.
You only need to refresh the relations manually when you change the rules after the files are already in the database.

Why do you let IMatch search is so many folders for versions?
For example, if you have a file in ...Digital Camera\2018 which may have versions, is it likely that versions are found in the folder ...\2009?

This looks to me as if IMatch has to process many files for each master to find the versions, and most of the files cannot be a version...

For example, IMatch reports

1745 masters, 1 definitions, 1728 links found. 145,826,304 potential file links (versions) analyzed.

So it had to analyze 145 million (!) files to find all versions for the 1745 masters. Because it has to search so many folders for each master, and each file with these folders. Is this really necessary?
This took over 90 (!) seconds to complete. Yikes.

Even for 17 masters IMatch has to check almost 1.5 million files to detect the versions. This takes about 1 second.

I would recommend to consider again where the potential versions for a file can exist.

If you have a master in the folder ...2018, where do you store the versions?
In the 2018 folder or a sub-folder of it? In that case, just searching the "Master Folder" and one level down would be sufficient. No need to search all folders created in the past decade.

hluxem

I do have a parallel structure for my original files and the developed jpg files with a structure like "...originals\year\month" and "...developed\year\month". Sometimes there is another level under month. Currently I just had Imatch look in a list of folders.

While I did not realy consider it a speed problem besides the sporadic cases when it seems to react differently it make sense to limit the search. Not sure how you got to 145 million potential links, but that doesn't matter.

I just read up on the folder pattern definition and changed it to a folder pattern "developed\{d1}\{d0}" and "developed\{d2}\{d1}\{d0}" to catch the cases with an additional sub folder. Is there a better way or can I use a replacement expression like original path and replace "original" with "developed"?


Heiner




Mario

Quotebesides the sporadic cases when it seems to react differently

There are sometimes file system calls involved in checking links, and of course many database operations. And when IMatch is currently re-calculating a data-driven category, updates some user interface element like the category panel, the database may be busy and checking for versions may be slowed down. Or even interfere with the user interface.

QuoteNot sure how you got to 145 million potential links,

That's hard info from your log file. You have refreshed relations (or scanned/rescanned a folder) and IMatch has identified 1745 masters (by the criteria in your rule).
Then IMatch had to check each file in the many folders you have specified as potential version folders to find out it is a version. (1745 x number of files in all the folders combined) => 145,826,304 files to check. 1728 versions were found.

Quotebut that doesn't matter.

This matters a lot because the above check took 90 seconds, keeping the database / CPU probably maxed out during the whole time. And maybe delaying other operations like user interface updates a lot.

If your file layout is something like "original...\folder with masters" and your versions are in "derived...\folder" you should try to find a rule that somehow 'produces' derived from original, not searching most of your hard disk most of the time for convenience. The folder and directory patterns available for versions should help you with that.

You can see how efficient your setup is by searching the log file for

CIMRelationManager::UpdateRelations

The data tells you how many files IMatch has to analyze for each master or set of masters. Ideally it is only one folder and hence only a few hundred files per master.