300'083 files in the IMatch-database

Started by sinus, November 17, 2020, 12:19:19 PM

Previous topic - Next topic

sinus

After my shooting today I have now
300'083 files in my DB.

I know, there are users with a lot more files, but nevertheless quite nice to see, how IMatch works still very good.
No problems.  :)
Best wishes from Switzerland! :-)
Markus

Mario

Very good.

The problem is not the number of files. My largest test database has over 700,000 files now. And there are quite a number of users who manage one million or more files.

But you can run into problems when you over-use advanced features like complex data-driven categories, formula-based categories based on tag data etc. Or accumulating 100,000 categories somehow.
Say, updating a data-driven category for a 100,000 files database takes 3 seconds. The same operation takes ~ 9 seconds for a 300,000 files database. And 30 seconds for a database with 1 million files.
Sorting 500,000 files takes five times as long as sorting 100,000 files (approximalety).
This is just how it is.

I know you do a lot of fany stuff in your File Window layouts, using metadata variables and stuff. Although IMatch does some smart things here, there is no free lunch. What may be fast enough for 200,000 files databases may feel a lot slower for a 500,000 files database.

It is always possible to shot in your own foot by over-using 'expensive' features like data-driven categories. Every time you change the metadata of only one file, most categories are invalidated and need to be re-calculated (when they are needed the next time). Combine this with category-based variables in the File Window, and each time you change a keyword or metadata, all categories are re-calculated immediately - because the File Window needs them.

Most users will never experience any issues, though. Average database size is about 100,000 to 150,000 files. And computers get faster all the time. And I continuously improve performance when I find a better way to implement something in IMatch.

Jingo

Quote from: Mario on November 17, 2020, 01:10:12 PM
Very good.

The problem is not the number of files. My largest test database has over 700,000 files now. And there are quite a number of users who manage one million or more files.

But you can run into problems when you over-use advanced features like complex data-driven categories, formula-based categories based on tag data etc. Or accumulating 100,000 categories somehow.
Say, updating a data-driven category for a 100,000 files database takes 3 seconds. The same operation takes ~ 9 seconds for a 300,000 files database. And 30 seconds for a database with 1 million files.
Sorting 500,000 files takes five times as long as sorting 100,000 files (approximalety).
This is just how it is.

I know you do a lot of fany stuff in your File Window layouts, using metadata variables and stuff. Although IMatch does some smart things here, there is no free lunch. What may be fast enough for 200,000 files databases may feel a lot slower for a 500,000 files database.

It is always possible to shot in your own foot by over-using 'expensive' features like data-driven categories. Every time you change the metadata of only one file, most categories are invalidated and need to be re-calculated (when they are needed the next time). Combine this with category-based variables in the File Window, and each time you change a keyword or metadata, all categories are re-calculated immediately - because the File Window needs them.

Most users will never experience any issues, though. Average database size is about 100,000 to 150,000 files. And computers get faster all the time. And I continuously improve performance when I find a better way to implement something in IMatch.

Curious - is there any way to "identify" how these expensive features are impacting ones database?  I don't use data driven categories beyond those supplied by default - but I know in the past you mentioned they are used even when I'm not "actively" using them (ie: just by having the panel open for example).  It would be great if we could see what resources are being used at a given time - that way, we could shut them down, delete expensive ones, etc...

Mario

You can search the log file for lines containing #sl  This tag indicates operations which took longer than 5s (which can be totally OK, e.g. loading a database).
The special #SLOWCAT tag is added when a category takes more than 2 seconds to recalculate.

Note that IMatch updates panel like Category Panel in the background and may even delay the update when the database is currently too busy or IMatch is indexing images (which would render the categories invalid after 0.5 seconds anyway). The same is true for the Collections panel. But variables and category formulas or other means which access categories or collections may require an immediate update. Since this is all done in parallel and in the background, in combination with other background tasks like indexing, relation updates, face recognition, reclustering, maybe write-back, the interactions are often quite difficult and results are hard to get by.

I get only very, very small number of reports about IMatch not performing well.