Application Limits

Started by kylesk, November 18, 2016, 05:14:17 PM

Previous topic - Next topic

kylesk

Sorry for 2 posts in 2 days, but they are unrelated.

Have a few questions for this one, but first my setup\scope:

Currently i have all of my content on a 4x15krpm sas array, Imatch DB is on a 256gig SSD, 6 CPU cores with hyperthreading, and 128GB of mem.

I have imported about 500k photos so far and have been doing them in batches and it def takes a while having to keep coming back to my server to import more.

There is still about 3 million photos and i would love to just set it, forget it, and let it run for a few days.

Do you guys have a best practice on how many photos to import at a time? I have noticed that every once in a while i get a popup error saying out of memory. Maybe just an app resource limitation.

Also is there a way to schedule metadata write-backs? Kind of like a maintenance mode. During the day i can work with the db and then at night when i am sleeping it can work on write-backs.

Last question. Is there a limit on the write-back queue length?

Thanks a ton!

Mario

#1
QuoteThere is still about 3 million photos and i would love to just set it, forget it, and let it run for a few days.

YIKES! This is an enterprise-grade image collection. And way beyond what IMatch can handle in a database.
IMatch databases are designed to handle the typical image collection size of typical users, which means something between 50,000 and 300,000 files.
And that is already more than you can handle with most of the competitors.

500,000 photos in one IMatch database is already stretching the limits. I did not design IMatch to handle this. And features like data-driven categories, filters, sorting, searching etc. will be quite slow. They become slower the more files you add to your database. For example, analyzing the data of 500,000 files in order to update only one data-driven category is already a massive amount of work.

Handling millions of images usually requires a software like Widen or AssetBank, with dedicated server farm consisting of multiple machines, a distributed database etc. Nothing you can handle with a software that cost you hundred bucks, sorry.

Check out the major DAM vendors (Widen, Canto, FotoWare, Extensis). A DAM consultant will get in touch with you, gather your requirements and then come up with a installation plan for your on-site installation. Or maybe a cloud-based contract. This is way out of the league of a 100$ software like IMatch. For 100$ you get maybe one work hour for a DAM consultant that figures out how much software and hardware you need to manage 3.5 million photos... ::)

sinus

Or you use IMatch and divide the DB into pieces of say, 400'000 images per DB.

I have now 230'000 images without problems except sometimes a BIT slowly, but without troubles.
There are users out there with about 400'000 - 500'000 images.
Best wishes from Switzerland! :-)
Markus

kylesk

Hey i am up to try and divide the db up : )

ChrisG.

Yes this is definitely an enterprise level requirement. It's amazing that IMatch manages more than 100k images. Adobe Lightroom can barely cope with this number of images.

For comparison I'm managing an archive of 300k+ assets on a FotoWare DAM running on an enterprise server with 24-cores, SSD RAID enterprise drives, 128GB RAM and enterprise grade RAID storage. It's a blazing fast system but metadata updates of hundreds of gigabytes of images still takes time. Some of my files are in the GB range and this can challenge even the most expensive systems.

Maybe you can break up you collection into more manageable numbers?

Mario

#5
Quote from: ChrisG. on November 20, 2016, 01:31:37 PM
Yes this is definitely an enterprise level requirement. It's amazing that IMatch manages more than 100k images. Adobe Lightroom can barely cope with this number of images.

For comparison I'm managing an archive of 300k+ assets on a FotoWare DAM running on an enterprise server with 24-cores, SSD RAID enterprise drives, 128GB RAM and enterprise grade RAID storage. It's a blazing fast system but metadata updates of hundreds of gigabytes of images still takes time. Some of my files are in the GB range and this can challenge even the most expensive systems.

Maybe you can break up you collection into more manageable numbers?

Thanks for the feedback, very useful.
IMatch databases are usually in the range of 50,000 to 150,000 (from the feedback I get). Quite a number of users manages 200,000 to 300,000 files in a database. Some even 500,000. But that's stretching it of course.

There is no free lunch and searching 200,000 files will roughly take twice as long as searching 100,000 files. The same O(n) runtime can be expected from operations like Filters, Sorting, data-driven category updates. Not to mention that IMatch has been designed to hold as much data as possible in memory, and the more files are in a database, the more memory it needs for categories (which uses the largest chunk of RAM, except from viewing images).

I think that managing 300,000 files on a regular PC with a 100US$ software like IMatch is quite a bit of an achievement. Especially when you consider what IMatch does on top!

I don't know what an Enterprise edition of FotoWare costs these days or how much you paid for your super-server, but I bet it exceeds 100 bucks  ;D

Adding or updating metadata in gigabyte-sized files will take a long time, especially over a network. IMatch uses ExfiTool for this. And ExifTool has been written with a safety first approach - using a streaming model. This means ExifTool pulls the entire file over the network, splices it locally to make room for new data etc, and then streams it back over the network once the work is done. I sometimes wished that small operations like setting a rating would be faster, but even such "one byte" updates require updates in several places, re-calculations of digests etc. Metadata is a true mess.

ChrisG.

Most enterprise DAM system will cost six figures. FotoWare does everything on the server-side. The clients send the metadata changes to the server and this keeps network traffic very low and the system fast. The system does not use data-driven categories since this could slow down the entire system. Instead FotoWare uses search driven categories and taxonomies which the system handles very easily.   

Mario

#7
How many (concurrent) users do you handle with this setup?

IMatch AnywhereTM also does everything on the server - super-low networking traffic. Fast even on smart phones or over mobile connections. And you can handle databases with 300,000 files with it! :)

Version 1.0 is almost ready to be released. I'm already working on the final automated build process.

Version 1.0 is "browing only" but can already give a large number of users access to an IMatch database - from a web browser.

Later editions of IMatch Anywhere will add write-back features (metadata, categories, file upload etc.) and then users can do a lot of DAM for very little money on all devices and platforms. Only the IMatch server runs on Windows, the user interface for users is cross-platform.


ChrisG.

The system currently only has 30 users with 2 librarians doing most of the batch processing. Search and downloads and small updates are handled very easily. The biggest bottleneck is the storage and the network. There is no way around big files, slow storage and network. Having the ability to schedule XMP metadata updates helps to improve system performance and availability.

Like everyone else I'm looking forward to seeing the release of IMatch Anywhere. It's looking good!

Mario

#9
QuoteIt's looking good!

The fun thing is: It's now looking even better. The video is still from an early Beta.
The UI has been cleaned up and simplified, making it even easier to use. Without sacrificing functionality.

It now installs in less than one minute.
The only option to initially configure is to select the IMatch database you want to "publish".
Another click and you browser opens IMatch WebViewer for a first look.
"Download to go" in maybe two minutes.

Still, IMatch Anywhere has many options to configure, to adapt it from "home usage" to "corporate usage".
Many security features.
User Management. Group Management.
Many options to control who can access what.

And, probably interesting for you, there is a neat Administrator Panel, which shows very useful statistics and allows you to control the system remotely.
Here is a screen shot from my web browser (reduced to 800 pixels, sorry):


kylesk

Hey Mario,
I do have a TON of images, but i am a smalltime online entrepreneur and the price for this software is perfect. An enterprise level DAM is just not even close to feasible to me lol.

I have tried out quite a few open source dams over the past few days and honestly... none of them come close to the features and flexibility of Imatch. I do not work with a ton of metadata. All i mainly need to do is assign a handful of tags\pins\ etc to each stack. The way you have it laid out with all the rule flexibility feels like it is much more suitable to people that have sql\scripting experience.. which i do.

In this software... currently i have 721,293 images and it is running like an absolute champ  :D

I decided to pull the DB off of the SSD and put it on the sas array and that definitely boosted performance. DB is currently 10GB.

Mario

QuoteThe way you have it laid out with all the rule flexibility feels like it is much more suitable to people that have sql\scripting experience.. which i do.

I always try to implement features in a way that makes them easy to use, but without limiting features for power users.
There are many entry-level dumbed down (Apple mode) image manages around for free or at a cost. There were many more in the past, but they have been weeded out by the "free" on-line image services.

IMatch provides a Enterprise-DAM set of features for a very fair price. This may make the learning curve a bit steeper in the beginning, but IMatch won't let you down in the middle or break when you try to add more than 50,000 files to a database (catalog). I recommend you keep your database size within sensible limits. I can test only databases up to 400,000 files (I don't have more images). I'd say stay under 500,000 or the database may break later when you add more and more categories.

mvkuilen

QuoteIMatch databases are usually in the range of 50,000 to 150,000 (from the feedback I get). Quite a number of users manages 200,000 to 300,000 files in a database. Some even 500,000. But that's stretching it of course.
I've started to add my image collection into iMatch and I'm already at almost 150,000 and have only reached the year 2008. Considering the limits mentioned in the quote, would creating multiple databases be a realistic option? Is there a way for each database to know what the other contains or would that just make for one big database anyway so may as well stick with one?

Mario

#13
My largest database has now 700,000 images and I use it daily.

The quote above is from 2016 and a lot has changed in IMatch since then.
IMatch 2020 is the fastest IMatch version ever, and I have invested a lot of time in making it perform a lot faster with a lot more images.

According to telemetry, the average database size is about 150,000 files and the largest database in use has over 2 million files.
About half of the user base runs databases between 100,000 and 300,000 files.

sinus

Quote from: Mario on August 06, 2020, 09:09:05 AM
My largest database has now 700,000 images and I use it daily.

Good to know.
My database is now 298'256 files.
And I work also every day with it.

And I have no problems, also the speed is fine.
Best wishes from Switzerland! :-)
Markus

mvkuilen

Thanks for the validation. Now to continue to load all the images. About half way there so it should top out at around 300k.

Mario

Mind the tips in the help.

Add files in batches of 20,000 to 50,000 files.
Make backups of your database in-between.
Run a diagnosis and compact for optimal performance.
Make sure your virus checker is not constantly scanning the database.

Adding 300,000 files for the first time will boil up a lot of metadata issues, probably some damaged or badly corrupted files. Which may cause IMatch, ExifTool, the used WIC codec, LibRaw or one of the other helper components to struggle or even crash.

This is normal and when you restart IMatch it will continue where it stopped. But keep the #logfile to learn which file(s) were processed last and caused the problem. These files then need a review later, to diagnose what the problem is.

This is normal and does not harm IMatch. Just be prepared that it can happen when you process 300,000 files collected over many years (probably).