How to archive database

Started by dpop100, November 14, 2016, 02:42:59 AM

Previous topic - Next topic

dpop100

I have been using iMatch since 3.4 and my database has grown to 21gb with 523,750 files in 7,176 folders. It seems to be performing a bit sluggish and I would have no problem with archiving the first 10 years of photos into it own database and creating a new one. I know I could just stop updating the old one and start a 2016 database, but I was hoping to hold on to the last few years. Does anyone have any suggestions or a methodology to do this?

Thanks
David


Panther

I'm sure there are several ways to do this, but my approach would be to simply make two copies of the database (both the image files themselves and the iMatch database files).  Rename those databases appropriately (one for the 1st 10 years, the second one for everything else).

Then, open the first one in iMatch and tell it to delete all everything after the first 10 years, and then close that database.  Then, open the second one in iMatch and tell it to delete everything in the first 10 years, and then close that database.

Once you are sure the two new databases are doing what you want/expected them to do, you can retire/move/stop using your old combined database and you'll have two smaller databases to work with.

sinus

Quote from: dpop100 on November 14, 2016, 02:42:59 AM
I have been using iMatch since 3.4 and my database has grown to 21gb with 523,750 files in 7,176 folders. It seems to be performing a bit sluggish and I would have no problem with archiving the first 10 years of photos into it own database and creating a new one. I know I could just stop updating the old one and start a 2016 database, but I was hoping to hold on to the last few years. Does anyone have any suggestions or a methodology to do this?

Thanks
David

Hm, a good question, was is guilty also for me.
I have now 13 GB with 230'000 images. So this is only the half of you.
But anyway, I thought about splitting the db.

What do you mean with "sluggish"?

I think, it would make sense, at one point divide the DB into two or more.
I mean, files nowadays are bigger, though the computers are faster, to count for IMatch with 150'000 or with 600'000 must make a difference. When I make for example a compact and optimize, it takes now 9 minutes.
With a smaller db is it done in 1 minute or so.

If I will divide my db (and I know, one day the day will come  ;D) also in pieces of years.
If I cut one day the db, I guess, I will divide it not in 2, but in 3 pieces.
Something like you wrote: the last 10 years, the years before and the year from now on.

I think, for the actual db with new images, I will add the last year also (like now I would use 2015, 2016 and 2017), because the chances are quite big, that I want look something in the last year (or the year before).

I think, how you could divide the db, is clear for you, there are several ways of course.

And, btw, who knows, since the dbs are growing bigger and bigger, maybe one day Mario will have a genious thought and does make it possible, to search over several IMatch-DBs. I had this once in the old Atari-times. This was really cool. But ok, Mario, just in case you are listening  ;D ... and guess, such an idea is far, far away of your thoughts.  ;D

So, finally good luck, David, interesting, what others says to your question.
Best wishes from Switzerland! :-)
Markus

Mario

The procedure recommended by Panther is the usual way to go. 500,000 files is really a lot.

The easiest way to do it:

1. Make a copy of the database.
2. Rename the two copies so you can easily identify what they contain.

3. MAKE BACKUPS of both databases!!

4. Open the first database.

5. Make sure pending metadata has been written (no yellow pens in the file window) for all files you want to remove from that database.

6. Use the "Remove folder from database" command to remove the folders you no longer want to manage in this database (e.g., remove folders with images older than 10 years).
The "Remove Folder from Database" command will remove all information about these folders from the database, but not the images from your disk.

7. Open the other database. Check for pending write-backs. Then remove all folders newer than 10 years, using the same procedure as above.

You will end with one database for files older than 10 years. And with one with all newer images.
You don't need to spit your images or move them if you don't want.

If you also want to move the 10 year old files to another disk, do so before splitting the database.
Move them inside IMatch. Or move them in Windows Explorer or whatever and then use the "Relocate" command in IMatch to tell IMatch where you have moved the files.
Then start with 1.





ubacher

I have also plans to do exactly like you and split the db. I "only" have 280 000 files and some operations are getting rather slow.
(I already have a second db but with images which are quite separate so that the split is not that bothersome.)

Another idea I had was to work on a new db for the current year and then add the files to the "full" db at the end of the year.

I actually do this when returning from travelling. The db on the laptop holds only the files taken during travel = thus small and fast.
This merging of db's is not easy - I seem to run into some sort of troubles each time.


Working with files which are found on two different db's is where it gets difficult. I tend to select the images I want in the one db and
copy them to a temporary folder which I ingest in the other db. This way I can at least work on all selected files from within one db.




sinus

Splitting a DB will be a question for a lot of users, because in the digital world and the bigger sizes the DBs will growing quite fast.

I will think also about the best solution for me.
Best wishes from Switzerland! :-)
Markus

Mario

A typical case for slow-down when a database grows are too many data-driven categories.

Before you do anything rash, reduce the number of data-driven categories you use. Or set all data-driven categories you don't need all the time to manual update. This can improve performance a lot.

Nobody mentioned exactly which operations become slower so it's impossible to give detailed tips. Telling us that "some" operations are getting slower is not useful.

Of course searching 200,000 files instead of 100,000 files will be slower. Same for filtering etc.
But IMatch does not get per-sé slower just because you add more files.

For database of this size I strongly recommend SSD storage for the database file. My personal 350,000 files database performs very well so far. It's on a SATA 6 SSD.

sinus

Quote from: Mario on November 16, 2016, 11:17:38 AM
Nobody mentioned exactly which operations become slower so it's impossible to give detailed tips. Telling us that "some" operations are getting slower is not useful.

Mario, this is a good point.
As soon as I have more time, I will look into this a bit more and write, what operations are slower.

One question though:
you mentioned data-driven categories.
I think, here you means also the @keywords, because they are also a kind of data-driven categories.

And yes, the SSD is a good tip also.  :D
Best wishes from Switzerland! :-)
Markus

Mario

#8
@Keywords is essential. You cannot remove it or set it do manual. Many features in IMatch depend on it.

The log file log all 'slow' operations with the special prefix #sl
This is how we could easily identify a while ago that your many categories were causing problems, if you remember.

If you still run IMatch on a regular disk, upgrading to a SSD can double the performance, or more. A 256 GB SSD disk now costs about only 70€ and it will not only speed up IMatch but Windows and other applications as well.

If you go to other DAM vendors like Canto, FotoWare, Extensis, AssetBank or Widen and tell them you want to manage 300,000 files, they will very likely suggest that you setup a dedicated server farm...  ;)  And the DAM consultant who tells you that charges 800US$ per day  :-[ ::)

dpop100

Thank you all for your useful replies! Please excuse my rudeness for asking a question and then abandoning the forum for a month. My sincere apologies.
ubacher,  panther and sinus - thank you for validating my issue and your straight forward suggestions. I will feel confident in moving forward with the split confident I have not overlooked some important aspect of iMatch.

Mario - thank you for your response and very specific instructions and cautions on the splitting process. My intent was not to bash the performance, just recognizing I have a lot of data! I agree with your comment "IMatch does not get per-sé slower just because you add more files". That is why I said sluggish as a relative term. I have attached the application log if that provides any clue to my performance issue. Where I notice slowness is on initial catalog load, moving from folder to folder (folder image refreshes) and the backup/optimization process.

I'm not sure I understand what data-driven categories are. I use the iMatch 3x Categories only and most images only have 4-5 assigned - usually Year, Location, Event, People. I have relocated previous years original image files to external drives, separate 1 or 2TB drives for each year. (Also backed up to optical media)

I know that an SSD is a great performance enhancer. It is on my wish list but not in my budget as I store about 1TB of images per year and 1TB SSD are still about 300 USD.

I appreciate all of the worthy suggestions! Thank you.


sinus

Thanks, dpop100

A nice posting of you.
Data-driven cats are useful. I underestimated them at the beginning.

Usually (I believe) IMatch comes with some data-driven cats with a cat called something like "IMatch Sample Categories".

Data-driven cats does pull out some datas out of the images, say the camera-model and creates then automatically several cats for you, depending on the model.

I have for example suddenly some cats calles something like "Canon", "Petax" and so on ... without having such cameras (I use only Nikon).
Hmm.... I checked and in such cats (data-driven) are neatly some images, what I got from exern sources like example images here on the forum. Cool.

The same you can do with your own, say, cities, after this you have all your cities in several cameras.

See attachement.
Maybe you should read about these special categories in the help. Although not easy to understand first, maybe, but worth do think about them.  :)
Best wishes from Switzerland! :-)
Markus

Mario

Your database has a whopping 524,767 files. A 20 GB database file.
Nearly half a million files you are trying to manage with a $100 software.

For this amount of files, my competitors would recommend a dedicated server (farm) with multiple cores, 16 to 32 GB of RAM, fast SSD storage etc. Managing half a million files in a single database is nothing IMatch was designed for. That's an enterprise data volume and usually you would look at systems provided by companies like AssetBank, Canto, Widen etc. to manage such a massive amounts of assets. For $100 they might tell you what it will cost you...

Your 500,000 files database loads in 45 seconds, which is not bad at all.
Your log file does not contain more, it seems you have copied it just after IMatch came up.

You should

a) Switch IMatch to debug logging (Help > Support)
b) Load a database and perform some operations you consider slow. Work with IMatch for a while.
c) Then ZIP the log file.

But, frankly, 500,000 files and a 20 GB database is stretching things. A lot. On a very fast system with multiple i5 or i7 cores and super-fast SSD storage it should work OK. But if you plan to add even more files, you will have to split the database.


sinus

Quote from: Mario on December 08, 2016, 02:43:47 PM
Your database has a whopping 524,767 files. A 20 GB database file.
Nearly half a million files you are trying to manage with a $100 software.


Cool and great. And surprising.

But to defend IMatch, I have to say, although I have only 340'000 files (mostly images, but also some music, text and pdfs), IMatch works still really very good and quick enough in most cases.

I will work further some monthes and have a look at the speed and so on.  ;D

So, Mario, you created an astonishing piece of software!  :D
Best wishes from Switzerland! :-)
Markus

dpop100

Mario -
Yes certainly it is impressive, and again, not grumbling over the performance. I have tried many photo cataloging tools and none come close to the power and general speed of iMatch. I attached a new log file but I can't say any functions were terribly slow this evening. I did browsing from folder to folder, did a folder rescan, filtered by category, added categories.

Sinus -
Yes I will look into the data-driven categories. I have noticed the ones provided as samples but I never explored what they could do for me.

sinus

Quote from: dpop100 on December 09, 2016, 01:22:38 AM
Mario -
Yes certainly it is impressive, and again, not grumbling over the performance. I have tried many photo cataloging tools and none come close to the power and general speed of iMatch. I attached a new log file but I can't say any functions were terribly slow this evening. I did browsing from folder to folder, did a folder rescan, filtered by category, added categories.

Sinus -
Yes I will look into the data-driven categories. I have noticed the ones provided as samples but I never explored what they could do for me.

Bear in mind, that data-driven cats can also make IMatch slower. I for example do not automatically refresh these categories (in the preferences), I refresh them, before I use them.
But to be honest, it is really fantastic, but really use them I do not that often.
Best wishes from Switzerland! :-)
Markus

Mario

Quote from: dpop100 on December 09, 2016, 01:22:38 AM
Mario -
Yes certainly it is impressive, and again, not grumbling over the performance. I have tried many photo cataloging tools and none come close to the power and general speed of iMatch. I attached a new log file but I can't say any functions were terribly slow this evening. I did browsing from folder to folder, did a folder rescan, filtered by category, added categories.

The only slow operations logged are the updates for data-driven categories like ISO.
This is one of the IMatch Sample Categories. If you don't use then, delete them in the Category View. Or set them to manual update (in the properties panel below the categories tree).

Re-calculating data-driven is usually super-fast. But re-calculating many data-driven categories for 500,000 files can be a real drag. Especially when you don't need them.

dpop100

Thanks Mario and Sinus. I will try that!