How to safely cycle through all the files in a database?

Started by Carlo Didier, November 18, 2017, 09:52:02 PM

Previous topic - Next topic

Carlo Didier

I need to run a script against all the files in the database. Currently I have found this:
        function ProcessAllFiles(){

            IMatch.get('v1/files',{
                idlist: IMatch.idlist.allFiles,
                fields:'id,name,path'
            }).then(function(response) {

                AssignEventCategories(response.files);

            },
            function(error){
                console.log("Getting all files didn't work ...");
                console.log(error);
            });
        }

But this makes my PC run out of memory (16GB) and crashes IMatch ... The database has ~98000 images.
Is there a way to cycle through all the images without first getting them all through an endpoint?

Mario

This seems unlikely.
I tried than with a 480,000 database and checked the memory consumption in my browser with the memory profiler in the developer tools.

Fetching all 480,000 files takes a few seconds and the browser allocates about 70MB of RAM to hold the result (mostly the long file names in strings).
IMatch needs about 650 MB of RAM at that time.

I'm not sure what your AssignEventCategories function does, but I would take a close look at this.

It may be more clever to process the files in batches.
idlists have features which allow you to step through them in batches.
See IdList Pageing at https://www.photools.com/dev-center/doc/imatch/tutorial-recipes.html

Or first retrieve all file ids by requesting only the 'id' field. Then iterate over that array with additional calls to /files, maybe requesting 5000 files per batch.
See, for example, the ProcessFiles sample file.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Carlo Didier

Hi Mario,

it looks like something else went wrong, don't know exactly what, but I suspect that my
console.log(JSON.stringify(response.files,null,2));
which I had inserted for checking was the culprit.
console.log(JSON.stringify(response.files.length.,null,2));
does return 99073, so I'm pretty sure I get all the files.

I noticed that the display in the App tab of the Output panel isn't synchronous with the script execution in this case.
Example:
IMatch.get('v1/files',{
    idlist: IMatch.allFiles,
    fields:'id,name'
}).then(function(response) {

console.log("got them")
console.log(response.files.length);
console.log(response.files);
console.log("------")

will display (after just 2-3 seconds)
got them
99073
------

and only minutes later will the list of images appear, which should be before the "------"!.
This is confusing, because I thought it wasn't working while it actually was (in the background). So I started it several times and that probably caused the memory troubles and crash.


Mario

Don't dump so much data into the browser console. It will break.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Carlo Didier