IMatch database statistics poll/survey

Started by axel.hennig, January 14, 2017, 09:28:12 PM

Previous topic - Next topic

axel.hennig

Hi,

this post is intended to be a "statistics" post.

In some other photools.community threads I've read how many pictures some people have (sinus: 230,000; dpop100: 523,750) and sometimes Mario (IMatch Developer and Administrator) comments this with "enterprise level amount of assets" or he comments that "many stock photo agencies out there have much less images in their archives".

See other posts:
- https://www.photools.com/community/index.php?topic=5997
- https://www.photools.com/community/index.php?topic=6309
- https://www.photools.com/community/index.php?topic=6083
- https://www.photools.com/community/index.php?topic=4852
- https://www.photools.com/community/index.php?topic=5624

I don't know how much images stock photo agencies have, but I can imagine that a private person can have more images than a stock company. Perhaps some private people just don't delete "bad" images and stock agencies do, or ...

Mario sometimes says that if a user has such a lot images or so many categories in his IMatch database he should think about switching to an enterprise software.

Now, I just want to know what kind of databases (size, number of images, folders, categories, ...) IMatch users have. I'm just interested in something like "95% of all IMatch users have less than 70,000 images in their database" or "80% of all IMatch users have more than 150,000 images in their database" or ...

So if you are also interested in these things just share your information here (how to get the information is described in the screenshot). And I hope not only those people who think that they have "a big database" share their information.

I have two IMatch databases, one for images and one for videos.

My image database:
File Size: 4.71 GB
Folders: 1,351
Files: 103,666
Categories: 10,892
Total Size: 307.21 GB

My video database:
File Size: 31.89 MB
Folders: 119
Files: 682
Categories: 1,270
Total Size: 79.93 GB

Best wishes from Munich,
Axel

Kucera

And I thought mine was big :)
Images only.

File Size:698.45 MB
Folders:620
Files:12,623
Categories:359
Total Size:17.90 GB

And that's before I get rid of the duds ...
Looking forward to see the stats.
Regards  Emil

jch2103

Images only.

File Size: 2.24 GB
Folders: 978
Files: 55,444
Categories: 3,805
Total Size: 462.70 GB
John

pajaro

Images only.

File Size: 9.90 GB
Folders: 2,128
Files: 138,721
Categories: 2,382
Total Size: 800.72 GB

Mario

#4
Thanks to the changes I've implemented for IMatch 6, we have now a lot more room until we reach the top.
The critical combination was hundreds of thousands of files in combination with tens of thousands of categories.

500,000 files in a database. No problem.
20,000 categories in a database? No problem.
20,000 categories in a combination with 500,000 files? Potential problem.

Categories need to be super-duper fast. And this speed came from keeping them in memory all the time. The categories worked that way since I've invented them in 1998. At that time a large database was 20,000 files  :)

For some databases I've seen (350,000 files, 20,000 categories) IMatch needed 800 MB RAM just to keep the categories in memory. Not good. But I have a solution for that.

Yesterday I've created a database with 90,000 categories (ugh!) and ~100,000 files.
Each file had at least 10 hierarchical keywords and 5 to 10 other categories.
Just to create a real stress point. IMatch needed about 50 MB to manage these 90,000 categories.
I consider this problem solved. With a minimum performance impact. Yay!

(Of course 90,000 categories are not really something you want to have - because the Windows tree control used for the Category View and Category Panel becomes really, really slow).


IMatch can manage a lot of files. Without degrading performance too much.
Some things of course become slower when more images are added.
It takes roughly 2 times as long to search 200,000 files than it takes to search 100,000 files.
It takes twice as long to update a data-driven category if you have 200,000 instead of 100,000 files.

On the other hand, IMatch benefits from modern hardware.
Keeping a database on SSD storage is a massive performance boost.
IMatch also utilizes all processor cores for functions such as indexing files, filtering, category calculations, data-driven categories etc.
This allows it to scale very well and to stay speedy even for large databases.


"Enterprise-level"

This is a term I use when somebody tries to manage 300,000 to 500,000 assets (images) in a 110 US$ software like IMatch.

Most of the other big DAM vendors don't mention prices on their web sites. You ask them for a quote and a consultant will contact you. This is usually a sign for "costs a fortune".
If you contact a company like Canto, AssetBank, Extensis, FotoWare, Widen etc. and tell them you want to manage 500,000 files in their cloud or on-premises (on your PC) they will start a project.

I recommend you ask them if their systems can handle 300,000 or more files on your hardware. And what the cost will be. This will be very interesting  :D
I would guess that they consider 100,000 managed "assets" as a large-scale DAM.

Here is some info about other DAM products and some feature ranges and price points:

https://www.thirdlight.com/articles/dam-cost


PS.: My largest database has 380,000 files. 12 GB on disk when I recall correctly.
The database file size (or the size of the managed files) has no impact no performance. It's only the file count, category count etc and the amount of metadata managed per file that really impacts performance.

DigPeter

File Size:2.57GB
Folders:1615
Files:54646
Categories:6030
Total Size:176.29 GB

Mees Dekker

File size: 2.94 Gb
Folders: 1099
Files: 71.008
Categories: 2243
Total size: 929.31 GB

jeknepley

I'd like to participate, but 1st a question - How do you gather the data as shown in the replies?

Mario

All information is available in the Information & Activity panel.

Jingo

I'm curious about something... and perhaps I should already know the answer but.... My computer has 16GB of memory... looking at my system, with IMatch running, I have 65% of that memory still free and available to apps.  IMatch is only using 290,000 KB of memory.. but even if it used 3GB of memory - I'd still have 40% available...  Why is IMatch using 800MB of memory (just 1/16th of what I have in my system) an issue?  If someone has a large DB with tons of categories, their problem would be solved with a $40 upgrade and a stick of memory?

Just to also add to the stats:

Images (JPG) mainly with about 100 videos mixed in:

File size: 2.52 Gb
Folders: 158
Files: 62,610
Categories: 1156
Total size: 291.75 GB


Mario

#10
IMatch can use up to 3.5 GB of memory. That's a lot.

The 800 MB in my example was for the categories alone. IMatch may also need another GB for the database itself and it's internal memory caches. Which means that you'll up with only ~ 1.5 GB left after the database has loaded. If you now process folders in the background while you also use the Viewer, memory may become sparse (as in the example of sinus).

Modern RAW files may need 100 or 150 MB RAM each. IMatch processes multiple images at the same time in the background when it scans folders. Combine that with a RAW in the Quick View panel and several RAW files cached in the Viewer and you might cause out of memory conditions for IMatch.

It may even get worse if the Windows RAM is fragmented or you run applications like Photoshop which by default suck up 50% of the RAM you have in non-cooperative mode (Windows cannot use this RAM as long as PS runs). This may cause memory fragmentation and Windows may be unable to give IMatch 3.5 GB if the system has only 8 GB. This happens far less often on computers having 16 GB or 32 GB RAM. 16 GB is the minimum these days I guess.


That's of course an extreme example.
I recall one or two cases over the past two years where the categories were really an issue. Always very large (300,000+ files) databases with 15K or 20K categories.

But databases become bigger and the number of categories can easily go into the 30K when users combine large keyword sets with several data-driven categories.
Reducing the category memory usage was always on my list, and two weeks ago I had a good idea that did not require me to rewrite the tried-and-true category code that is so reliable and fast.
So the category memory usage is now down to near zero. This gives IMatch at lot more room to breeze even for databases hitting 500,000 or more files. And that's really a number.

The problem would be solved by porting IMatch to 64 Bit. Sigh.
Which is doable of course. But I fear that many of the 3rd party libraries I use are not available for 64 Bit or have other issues. Such a port may cost several months of work. And may cause all kinds of side-effects and hard to find bugs.

This needs to be planned and run alongside the normal development. My general long-term plan is to do this first for IMatch Anywhere, because here the entire user interface (what you know as IMatch) needs not to be ported. And that is over 65% of the code I don't need to worry about.

jeknepley

File Size:8.84 GB
Folders:8,657
Files:259,810
Categories:3,132
Total Size:1.72TB

percythomas

File Size: 989.48 MB
Folders: 414
Files: 31,977
Categories: 1,628
Total Size: 234.09 GB

Jingo

Quote from: Mario on January 15, 2017, 03:09:44 PM
IMatch can use up to 3.5 GB of memory. That's a lot.

The 800 MB in my example was for the categories alone. IMatch may also need another GB for the database itself and it's internal memory caches. Which means that you'll up with only ~ 1.5 GB left after the database has loaded. If you now process folders in the background while you also use the Viewer, memory may become sparse (as in the example of sinus).

Modern RAW files may need 100 or 150 MB RAM each. IMatch processes multiple images at the same time in the background when it scans folders. Combine that with a RAW in the Quick View panel and several RAW files cached in the Viewer and you might cause out of memory conditions for IMatch.

It may even get worse if the Windows RAM is fragmented or you run applications like Photoshop which by default suck up 50% of the RAM you have in non-cooperative mode (Windows cannot use this RAM as long as PS runs). This may cause memory fragmentation and Windows may be unable to give IMatch 3.5 GB if the system has only 8 GB. This happens far less often on computers having 16 GB or 32 GB RAM. 16 GB is the minimum these days I guess.


That's of course an extreme example.
I recall one or two cases over the past two years where the categories were really an issue. Always very large (300,000+ files) databases with 15K or 20K categories.

But databases become bigger and the number of categories can easily go into the 30K when users combine large keyword sets with several data-driven categories.
Reducing the category memory usage was always on my list, and two weeks ago I had a good idea that did not require me to rewrite the tried-and-true category code that is so reliable and fast.
So the category memory usage is now down to near zero. This gives IMatch at lot more room to breeze even for databases hitting 500,000 or more files. And that's really a number.

The problem would be solved by porting IMatch to 64 Bit. Sigh.
Which is doable of course. But I fear that many of the 3rd party libraries I use are not available for 64 Bit or have other issues. Such a port may cost several months of work. And may cause all kinds of side-effects and hard to find bugs.

This needs to be planned and run alongside the normal development. My general long-term plan is to do this first for IMatch Anywhere, because here the entire user interface (what you know as IMatch) needs not to be ported. And that is over 65% of the code I don't need to worry about.

Thx Mario.. yeah.. forgot about the 32bit restrictions.  I know porting to 64bit is a bit deal... XYplorer is also 32bit and the developer has long delayed moving to 64bit for the same reasons... lots of work, libraries that would need to be replaced, and diversion time from supporting the 32bit version.  Thx for the info though... looking forward to Imatch 6!

StanRohrer

File size: 7.27 GB
Folders: 1606
Files: 146,612
Categories: 24,344
Total size: 1.32 TB

98% JPG files. I have never trimmed any carried-in categories from IM3 (I'm scared to screw up my database).

Aubrey

File size: 2.63 GB
Folders: 1,660
Files: 56,222
Categories: 5,950
Total size: 600 GB

jelvers

And here are my data (lots of Raw files!!):

File size: 4.52 GB
Folders: 1,239
Categories: 277
Total Size: 117 TB

Regards, Juergen

Mees Dekker

117 TB: wow!!!!

How many files does it take to fill that big a storage?

ovrevid

File size: 1,90 GB
Folders: 1233
Files: 56879
Categories: 1319
Total size: 404 GB
-- Vidar

sinus

mostly images, mostly: nef, jpg
also other files: txt, indd, pdf, doc, mp3 usw...

File Size: 12.74 GB
Folders: 759
Files: 240'318
Categories: 12'853
Total Size: 2.39 TB
Best wishes from Switzerland! :-)
Markus

jelvers

Quote from: Mees Dekker on January 16, 2017, 10:16:27 AM
117 TB: wow!!!!
How many files does it take to fill that big a storage?

Mees, you are right! I meant 1.17 TB!!

Mario

IMatch 6 will come with improved support for videos.
And it you work with videos, 100 TB of managed file data is not uncommon. A HD video file has between 7 and 12 GB. Don't ask about file sizes of 4K or 8K videos...

BanjoTom

Images, videos, Office files, PDFs, and a few more . . .
File size:  1.60 Gb
Folders: 3973
Files: 69,624
Categories: 3026
Total size: 475.30 Gb
— Tom, in Lexington, Kentucky, USA

herman

Only images, both raw and jpg

File size: 587,03 MB
Folders: 1.237
Files: 18.004
Categories: 533
Total size: 225,02 GB
Enjoy!

Herman.

jch2103

#24
I created a Google Sheet with the data submitted so far:
https://docs.google.com/spreadsheets/d/1-Br8lx8XvoOW-fzuzf7KXzwE0vnrHGnvHJYK1cPn4V8/edit?usp=sharing

Feel free to add new data and/or analysis.

John

meyersoft

The document is read-only, so here my statistics:
File size: 1.92GB
Folders:2448
Files: 65301
Categories: 4997
Total Size: 530 GB

jch2103

Quote from: meyersoft on January 16, 2017, 07:54:53 PM
The document is read-only...

Update: Sheet can now be edited; I've added data from meyersoft.
John


lanerellis

File size: 3.79 GB
Folders: 3,763
Files: 140,823
Categories: 25,887
Total size: 349.64 GB

Cheers! :-)

Mario

25K categories?
Then IMatch 6 will save a lot of RAM to manage those. And this RAM can be used by IMatch to make other things faster  :)

jeknepley

Preaching to the choir ;D

jch2103's spreadsheet lists my DB as one of the larger ones.

It's a tribute to Mario's genius that things run smooth and fast (rarely, almost never, is there even the slightest delay - enter a command and immediately see the result).

Thanks, Mario. It's a treat to use such a great product.

hluxem

mostly jpg images and videos
File size:  5.62 Gb
Folders: 1312
Files: 195,554
Categories: 6464
Total size: 2.17 TB

Heiner

pmbvw

File size: 8,53 GB
Folders: 1.184
Files: 171.827
Categories: 1.470
Total size: 865,85 GB


Tallpics

These are my stats.

But firstly I'll give a bit of background info.

I'm a very long-time and satisfied user of IMatch and could not function without it. That's why I have so many backups :-) I've invested a MASSIVE amount of time cataloguing my work and would never want to complete it again!

For the last 11 years I have worked as a professional Motorsports, Music Festival, Sports, Industrial and Press photographer.

However my databases cover a full 18 years of digital shooting.

My overall file numbers are HUGE but they include many 'duplicate pics' in the form of:

Original RAW files / Edited versions / Re-sized web versions / Supplied versions

You may not believe me when I say that I edit down the number of 'keepers' ruthlessly.... but I do!

You must remember that in my work (particularly shooting Sports) I end up with quite a lot of shots.

I've researched my databases and find that I actually shoot only approx 16,000 original images a year.


My total of Images is contained in FOUR IMatch databases that each cover specific areas of my work.

However up until last year the two largest database were combined into a very large database of approx 4.50 TB!!

Even at that size I had no issues with the way IMatch ran! This is an AMAZING piece of programming by Mario!

I only split the databases into two to allow for future increases in file numbers and this has bought me more time ;-)

It is my view, based on personal experience, that an 'average' user of IMatch will never run into a situation where IMatch cannot handle all their files.

Database-1

File Size:        9.93 GB
Folders:                178
Files:              139,858
Categories:      4,964
Total Size:     1.67 TB

Database-2

File Size:       21.34 GB
Folders:                  41
Files:               275,481
Categories:       4,664
Total Size:      3.21 TB

Database-3

File Size:         4.69 GB
Folders:                581
Files:                 32,473
Categories:           444
Total Size:    414.70 GB

Database-4

File Size:        2.48 GB
Folders:               801
Files:              11,457
Categories:        708
Total Size:  69.11 GB

COMBINED DATABASE TOTALS

File Size:        38.4 GB
Folders:             1,601
Files:             459,287
Categories:   10,780
Total Size:     5.36 TB

Mario

QuoteIt is my view, based on personal experience, that an 'average' user of IMatch will never run into a situation where IMatch cannot handle all their files.

IMatch 6, thanks to the improved category management, will be able to handle even larger databases.

Lord_Helmchen

Only photos, about 50% raw and 50% JPEG

File Size: 6.79 GB
Folders: 1,895
Files: 166,603
Categories: 4,202
Total Size: 1.17 TB

I entered my data also in the sheet and my DB is pretty close to the "quantile at 0.75" figures - except for categories, perhaps I do not invest enough time into tagging my photos...  :o

rgdudley


My image database:
File Size: 2.12 GB
Folders: 901
Files: 31,591
Categories: 2,513
Total Size:  104.75 GB

Richard
R

Frank

    Database file size on disk:   4,59 GB
    Number of folders:            1.285
    Number of files:              80.254
    Number of categories:         14.246
    Clearing oid cache.
   Total Size                               2,34 TB


loweskid

Added my stats to the spreadsheet -

Database file size on disk:   1.38 GB
Number of folders:               2,262
Number of files:                   57,748
Number of categories:         1,318
Total Size                            1.63 Tb