Detecting new versions without rescanning

Started by Ferdinand, August 18, 2014, 09:53:36 AM

Previous topic - Next topic

Ferdinand

I've noticed something odd, and I wonder if this will change in 114 with new ways of detecting versions.

I have a folder with a RAW file and a version in a sub-folder.  File relations are configured and working.  In background processing I have file relations configured to automatically refresh.

External to IMatch I now create a new version in an additional subfolder.  If I add the subfolder to IMatch by dragging it onto IMatch, then the file is not immediately recognised as a version.  I have to rescan the parent folder with the RAW master for that to happen.

Is there any way fo have the new file recognised as a version at ingest?  Is this one of the things that will change in 114?   I have a feeling we've discussed this before, but all I can find are requests for automatic propagation at ingest time, although that would need automatic recognition of versions to work.

Mario

When you add a folder, IMatch checks it for masters. For each master (!) found, it scans the folder hierarchy for versions.
If you add a sub-folder which happens to contain some versions for master files somewhere up the hierarchy, IMatch cannot find the masters or detect that the files in the new sub-folder are versions.

In order to do that, IMatch would have to
a) add/update the folder.
b) Check for masters and go down the folder hierarchy to find versions
then c) go up the entire folder hierarchy, scanning each parent folder recursively to find masters, and then scan down again to find versions.

This could take a long time and ruin performance badly. There are limits of what IMatch can do. IMatch detects relations from the folder containing the masters, not the other way round.

Ferdinand

#2
QuoteI see your point, although it's hard for us simple users to know what causes substantial performance issues.

It's easy. When IMatch has to process 1,000 files when checking for relations, it has to perform  1,000 x 1000 (1 million) tests. Each file can be a master, and each file can be a version. So IMatch iterates over the 1,000 files. For each file which meets the "master" criteria, IMatch checks all other 999 files if they meet the version criteria. The larger the scope to search is, the more time consuming this becomes.

Unless your relation definition contains specific folders or even the global scope (worst case), IMatch looks in the folder currently scanned only. And then goes down from that, depending on how you have configured your relation rules.

If IMatch would use the other approach, it would have to scan upwards one, two or more folder levels. For example, you have a folder hierarchy like:

c:\
|-- c:\images  (20,000 files)
  { several dozen or even hundreds of sub-folders }
  |-- 201408  (100 files)
      |-- versions (100 files)
      |-- print (30 files)


You are currently processing files in 201408 folder (100 files) in your RAW processor. This produces "folder changed" messages frequently. IMatch rescans the 201408 folder in the background. It then applies all relation rules, scanning the 100 files in that folder to check for masters, and then going down into the versions and print folders to look for new versions. This causes 100 x 130 (13,000) relation operations.

In order to detect that some of the files added to or updated in 201408 are in fact versions of a master in the c:\images folder, or one of the sub-folders of c:\images, IMatch would have to search c:\images and all sub-folders recursively. And this means 20,000 x 20,000 (400 million!) relation operations, instead of a few hundred as in the current case.

DigPeter

I have a similar situation.  I process raw images in IM before converting the to jpegs in Lightroom.   The raw fies are on my exernal G drive;  the jpegs go onto the internal C drive.  Both drives have the same folder structure below the folder "My Pictures".  File relations are set up for the raw files in G drive to be masters and the jpegs in C drive to be versions.  Automatic refresh of versions is set.   When I import the jpegs form LR, they are not automatically refreshed.  Not a big deal for me;  I just do F4 - R in the masters' folder and the new jpeg versions are quickly established.

Mario

@Ferdinand: Sorry. I accidentally replaced your post with mine (pressed Edit instead of Quote).

Ferdinand

Quote from: Mario on August 18, 2014, 01:07:47 PM
@Ferdinand: Sorry. I accidentally replaced your post with mine (pressed Edit instead of Quote).

Ahem!    :o   What I also said was:

IIRC, there are some new scripting methods in 114 that allow me to fresh file relations.  Since I know exactly where the master is and what it is called, I can write a script which would be triggered at ingest time that finds the master quickly and easily and refreshes file relations.  Hopefully this customised approach to finding the master would minimise any performance hit.

Ferdinand

Quote from: Mario on August 18, 2014, 01:07:47 PM
Unless your relation definition contains specific folders or even the global scope (worst case), IMatch looks in the folder currently scanned only. And then goes down from that, depending on how you have configured your relation rules.

The global scope case would be tough, I agree.  But I'd have thought that the sub-folders case would be ok.  After all, if you're looking down the tree to find versions then potentially there is an expanding tree, which is what the current refresh relations does.  If you're doing the reverse and looking up the tree for a master then you're searching a narrow and specific path.  I think this statement is misleading:

Quote from: Mario on August 18, 2014, 01:07:47 PM
In order to detect that some of the files added to or updated in 201408 are in fact versions of a master in the c:\images folder, or one of the sub-folders of c:\images,

It's the last part of this that I think is misleading.  Under the "versions in sub-folders" setting you're only  searching up a narrow vertical path, you're not searching up and across and down.  So the number of images you're searching is narrow and probably less than the number you're searching the other way when you look for versions.

I scripted this for a later version of my image synchronisation script and there wasn't much of a performance hit, and we're only talking about doing this once per file at ingest time.

But for the general case it would not be a good idea.  But then any sort of refresh relations under the general case is going to be slow.


Mario

IMatch would have to go up one or more levels, and then search the sub-folders. So we end up at least in c:\images, with the 20,000 files.

Before we spent too much time with this, try things out when 5.1.14 is released. Then we can re-open this discussion. I don't have time right now.

Ferdinand

I don't know what we were arguing about.   :o

My initial question was:  will this work in 114 with all the changes in detecting versions?, and the answer should have been yes, because it does. 

Thank you!!