Q: reasonable size of db?

Started by markkums, May 27, 2014, 05:01:34 PM

Previous topic - Next topic

markkums

Hi all,

I have now 100k+ images in my db and I feel things have become abit slow. Maybe this is also due to my system, but this brings two questions in my mind:

1) Is it possible to say if there is any reasonable size of the system managed in one database. I mean, if I had e.g. 10k images in one db, it probably will run very fast in all respect (I believe speed of each function is not depending  on image count) compared with db having 100k.
In practice, i could think that my images from 2005-2010 could be in one db and images from 2011-today could be in another db.

2) If I follow above question and decide to split my image storage between two databases, another question comes valid: is it (or will it be) possible to combine two databases into one, if there rises some later need e.g. to filter all images together?

B.r: Markku

Ferdinand

Is your DB on an SSD?  If not, it's something worth trying.  My DB is smaller than yours, but not by that much.  When I had it on my RAID 1 array it was a bit too slow.  On the SSD it's alright.

markkums

Hi,

My db is on normal hard disk, but considering about SDD to upgrade my desltop performance, in general.
Need to find out if an additional SSD can be installed , or if the whole system needs to be re-installed in a SSD.

B.r: Markku

Richard

Performance - Keep the database in good shape While IMatch is indexing files, it writes millions of information items to the database. Especially when you add tens of thousands of images in one go, the database may become slower during that process. The second 10,000 files take longer to process than the first 10,000 files. This slowdown is usually not dramatic. But if you add more than 50,000 files to your database, you should run a Database Diagnosis between each batch of, say, 30,000 to 50,000 files.
When you run a Database Diagnosis, IMatch will also optimize and compact the database. This restores the performance pretty much to the performance of a pristine database. You can run a database diagnosis at any time. IMatch stops background processing of new files while the diagnosis runs and automatically continues afterwards.

Mario

The slowest process is always adding files to the database. And it gets slightly slower as the database fills up, but not by much. I have written a lot on performance in the help so I suggest you check there as well.

A database of 200,000 or 300,000 indexed assets is real large. IMatch handles this of course, but it naturally takes longer to search 300,000 files than it takes to search 50,000 files when you use a filter or a data-driven category. It's always a trade-off.

Like all database applications, IMatch benefits a lot from fast disk storage. A modern SSD is ideal, but even a cheap high-speed (!!!) USB 3.0 stick can speed up the database by several hundred percent. And even a very large database fits nicely on a 30 US$ 64 GB USB 3.0 stick. And performs well.

Splitting a database is usually not a good idea. Because you may end up importing/exporting categories all the time, or the thesaurus or whatever, trying to keep the two databases in synch. IMatch has been designed to handle large databases, even very large onces with 200,000 or more files. If you have enterprise-level asset libraries with millions of files you may consider splitting your IMatch database. Or setting up your own server room with several servers, disk storage systems, cooling, 24/7 administrator support etc. and then buy one of the enterprise-level DAM solutions e.g. from Canto, FotoWare or others  ;)

markkums

Hi,

Thanks for your replies. Yes, it is good to know  what Richard and Mario has written. So, if I understand right, after indexing is done (considering a large volume added in one time) things may become bit faster in use.

In my case major part of files were originally in iMatch3 db which became converted into iMatch 5 and after conversion I have added some thousands recently created files into iMatch5.

The idea of USB 3-stick sounds good and is maybe the easiest to implement. I´ll try that first.

B.r: Markku.

Mario

I cannot say how IMatch performs on your system.
Did you run a Database > Tools > Compact recently? This speeds up the database considerably.

Also, if you give us info about

- give us some info about your computer
- what you think is "slow"
- what you do in IMatch
- attach a log file

I may be able to give tips. Like "If you don't need the sample IMatch data-driven categories, delete them".

The largest database in use (as far as I know) with IMatch 5 Beta has almost 300,000 (!) files. It seems to work quite well.

Richard

QuoteThe idea of USB 3-stick sounds good and is maybe the easiest to implement.

I use a USB-3 stick and it does work well for me but I was careful to find one that was high-speed.

pajaro

Quote from: Richard on May 27, 2014, 09:55:12 PM
QuoteThe idea of USB 3-stick sounds good and is maybe the easiest to implement.

I use a USB-3 stick and it does work well for me but I was careful to find one that was high-speed.

I confirm that. As Mario advised some time ago I bought a USB 3.0 flash and saved my database on it. It helped a lot to speed things up.

Mario

I've was contacted by a Beta user yesterday who asked me if IMatch 5 would be faster when final.
He says he has added 550,000 (!) files so far but did not be specific about what he considered as "slow". Apparently his ultimate goal is to manage 750,000 files with IMatch 5. I asked him some questions about what he considers slow, and to send me a log file so I can see the performance data IMatch logs.

He also said that Lightroom had no problems managing 750,000 files but that he considers IMatch 5 more powerful. And LR apparently messes up the metadata in his files frequently. From my experience I can say that my LR installations 64-Bit W7/8.1 start feeling slow and becoming irritable at about 50,000 files or so...

But 550,000 files. Jeez.

Ferdinand

Clearly I'm not taking enough photos ....  :o    :o    :o

My earlier comments about SSD and performance were *after* compact and optimise, etc.  The performance on the RAID1 was ok, but a little frustrating.  On the (high quality) SSD it's good.  I imagine a fast USB3 would be similar.  But on either device, backup is even more important that usual, IMHO, and it's already extremely important.

Mario

Which features or task did you consider as "slow" on the RAID?

Importing files?
Writing files?
Category Operations?
Data-driven Categories?
Viewer?
File Windows?
Scripting?
...

For a system as complex as IMatch, there are always trade-offs. My general aim is to do as much as possible when files are ingested, e.g. building caches, pre-calculating information. Other in-memory caches are built during database load. But there are limits of what can be done. Knowing what users consider slow may be helpful to concentrate performance tuning tasks.

IMatch 5 in general is much faster than IMatch 3. It has many new "expensive" (in terms of CPU and disk utilization) features added, so in some areas this levels out, despite the fact that IMatch 5 utilizes all processors in the system.

Ingesting metadata is slower than in IMatch 3. A decision I made in favor of "richness" and robustness by using ExifTool to extract and interpret metadata. Ingesting files is (thumbnail extraction, visual query data, check-sums, cache images, ...) can be up to several 100% faster than in IMatch 3, because IMatch 5 does it on every CPU core available simultaneously.

If you have other "slow" spots, let me know.

Ferdinand

I am currently reingesting images into a new DB, so I can't go back and test at the moment.  My comment was not about ingest.  I know that takes time, but for each file it's a one-time event, or it's supposed to be.  Rather my comment was about the overall responsiveness of the program.  Moving from folder to folder and category to category.  Doing searches, and filtering.  These sort of things are faster on an SSD and just a little too slow on the RAID1.

ubacher

I just wanted to read up on some of the performance issues Mario mentioned explained in HELP.
Under PERFORMANCE I only found info about the db tools and about SYNCH mode.
I was looking for where to set FAST, NORMAL, and SYNCH OFTEN mode??? Could not find a setting
in Preferences.

But my question concerns the effect of open panels on performance when IM is doing a lot of updates.
I understand IM updates open panels - thus I expect that they would slow IM down.
Is this significant when updating 10000 files?  And what is meant by OPEN?


Mario

Edit > Preferences > Database.

All open panels (hidden, floating or collapsed) consume CPU and other resources.
IMatch tries to streamline the process by delaying the updates where possible (e.g. updating and reloading the panel only when the user makes it visible, or it is visible) but there are technical limits of what can be done.

It all depends on which panels you open. Of in which View you work. And what you do. There is no free lunch and when you utilize the advanced features of IMatch (data-driven categories, filters, ...) there is a price to pay. Not everything can be made faster by distributing it over multiple processors and so there are limits to how IMatch can 'scale' as well.

@Ferdinand:

QuoteMoving from folder to folder and category to category.  Doing searches, and filtering.

The time it takes to load a file window (bring a category into a file window) depends on how many files you have, if filters are active, how many panels you have open and visible, hierarchy mode and very much on your file window layout (how many data items you display) and the sort profile you have enabled. On my PC, a category with 50,000 files loads in about 4 to 7 seconds (Default file window layout, Default sort). Which is pretty amazing, really.

Ferdinand

Quote from: Mario on May 28, 2014, 03:15:02 PM
Which is pretty amazing, really.

Ja, I know.  I'm not complaining.  I was only saying that an SSD is faster if you're a little impatient, like me.

Mario

A SSD has access ("seek") times hundreds of times faster than a hard disk. This helps lots for database systems like IMatch.
How long it takes to load a file window depends on the factors I explained above. Since loading a file window may require a lot of data to be loaded directly from the database (if not yet cached), and the SSD will boost the performance for that as well. SSDs are cool by nature  ;)

Erik

While we're talking about performance and catalog size...

Is there a rule of thumb to how big the DB file might be vs. Number of Files Indexed?  even grossly? 

I ask because my home system has a SSD on it, but I had been keeping the DB off of it due to concerns it might get too big (and to see how the performance would be on a standard HDD) and the drive is not terribly large. Only on a couple occasions have I tried putting all my images into a DB during the beta as I fidget with learning features of the program and cleaning up some metadata issues associated with LR (probably similar to the guy you were talking about).


I've seen the idea of a USB3 Thumbdrive or SSD for holding the DB, and I guess I should try it out myself.  Is there a huge benefit to putting the cache on a SSD if space allows?


I also wonder if one had to choose between installing IM on an SSD    OR  only putting the DB on an SSD (and putting the program iteself on a standard HD), which would be better?


... sorry for hijacking the thread and digressing a bit.

jch2103

At the risk of further digression (and not answering your specific questions), be aware that SSD performance will suffer if there's less than about 25% free space on it (depending on the specific unit and how its free space is provisioned, etc.).
John

Mario

#19
DB on SSD: good.
Cache Files on SSD: Most likely a waste of time. The typical load time for a cache file from a regular disk is < 1 s.

The size of the database depends on several factors, from the thumbnail size chosen to the amount of metadata in your files etc.
For 100,000 files, databases sizes may vary between 4 GB and maybe 8 GB.

ubacher

You mean for 100,000 files (not 1000,000)?
I have 100,000 images and db is about 6Gb.

Erik

Thanks guys (for the rough DB size info).  I think I can afford a DB that is less than 10 GB.   I honestly hadn't been paying that close attention to DB size at all, and just assumed with the Thumbnails it could get big.  I think my total DB is only about 40,000 photos, so I should be good to go, and will move my DB over, especially when IM is released.

-Erik





markkums

Hi there,

I have been away for some time when striving with USB 3.0 and SSD installation and testing.

I try to summarize that experience:
Yes, DB in  USB 3.0 gives somewhat faster performance ( compared with standard HDD system) , but clear improvement will be achieved with SDD. My DB is some 8,3 GB and clearly starting iMatch takes place in a fraction of what it was with HDD. Same improvement can be seen in all filtering etc. Diagnosis and optimizing improved lot, except the last step which is lo..oong. Backgound processing is very fast. I installed a SSD of 120 GB (abt 70 euros) as an additional storage for data only, there is no system files in it.

I made also trial of using pack and go by moving the package from my desktop to my laptop. iMatch worked ok, but I do not understand fully the meaning of the whole package (I mean isn´t DB enough to move?) as part of settings in the laptop were and remained different from desktop. Did I miss something?

For image processing I use CO7. I also moved its session-db to the same SSD and that also gave better performance, so this makes the whole work flow clearly better!

I understand that in the use of the viewer deals with original files so having DB on SDD has not much influence on performance, but there remained some starge behavior: sometimes there is a cube icon with question mark shown in the window and this cannot be ovecame except by shutting and restarting viewer once more.

B.r: Markku

ubacher

I can confirm that this display of the cube sometimes happens. I have been unable to figure out
under what circumstances - nor how to get rid of them.

Mario

The "unknown packet" icon is displayed when IMatch does not understand the format, the WIC codec returns an error message, the cache image is damaged or unreadable etc. The log file has more info in this case.

markkums

Hi,

This icon comes just in the middle of nef-files, one or two files are shown as icon and then next one again as normal image. There is no difference in format, all files have same format.

B.r.: Markku

Mario

Without the log file I cannot explain if and what is wrong.
If IMatch has difficulties loading an image, it can have dozens of reason.

markkums

Ok, I´ll arrange log from the next occurrance of  this icon.

B.r: Markku

Aubrey

#28
Quote from: markkums on June 02, 2014, 08:32:19 PM
This icon comes just in the middle of nef-files, one or two files are shown as icon and then next one again as normal image. There is no difference in format, all files have same format.

Have you tried to rescan using crtl shift F5.
I have encountered the cube and found that helped.

Check you can open the NEF in View NX and/or capture nx2. That may help you identify if there is an issue with the file.

I also encountered the icon when using an external disk... The issue was that I had not unlocked the disk. Of course the thumbnails looked fine! Took me a while to work out the first time!

Aubrey