File relations - Splitting a complex rule to more simple ones OK?

Started by bekesizl, January 26, 2021, 01:53:57 PM

Previous topic - Next topic

bekesizl

I am revisioning my versioning rules an was thinking about a restructuring.
The question is, if it would (probably) cause a performance issue, if I would split a complex rule to multiple simpler ones?

It would be easier to maintain rules like the simplified ones.

Original rule (buddy file)
\.(cr3)$
/^_*//
^(_*{name})[+\-_]*[0-9|a-z]*\.(jpg|jpeg|dng|tif|tiff|cr3.dop|on1|dng.dop)$


replaced by following 4

Simplification - Rule 1 - DxO sidecar
\.(cr3|cr2|nef|tif|tiff|jpg|jpeg|dng)$
/^_*//
{name}{ext}\.dop


Simplification - Rule 2 - ON1 sidecar
\.(cr3|cr2|nef|tif|tiff|jpg|jpeg|dng)$
/^_*//
{name}\.on1


Simplification - Rule 3 - RAW buddies/versions
\.(cr3|cr2|nef)$
/^_*//
^(_*{name})[+\-_]*[0-9|a-z]*\.(jpg|jpeg|dng|tif|tiff|afphoto)$


Simplification - Rule 4 - JPG buddies/versions (for my older/smartphone photos)
\.(jpg|jpeg)$
/^_*//
^(_*{name})[+\-_]*[0-9|a-z]*\.(jpg|jpeg|dng|tif|tiff|heic|afphoto)$


Mario

The regular expression engine in IMatch is very fast.
Splitting into multiple rules would most likely cause performance degradation, because IMatch would have to apply the regular expression to 3 times as many file names, once for each rule you create.
Unless you have other reasons to apply different rules to different types of versions, one rule is best.

bekesizl

Thank you!
I was hoping it would not be the case, but I will have to combine those rules to some more complex ones.

Mario

Rules like \.(cr3|cr2|nef|tif|tiff|jpg|jpeg|dng)$ as master expressions are quite complex (IMatch must apply this to all files in the "changed" scope every time relations need updating, which happens very often, e.g. when files are added or updated).

Depending on the database size, this means hundreds of thousands of checks. You can se the runtime in the log file by searching for CIMRelationManager::UpdateRelations. It shows how many versions and masters were processed, how many files were checked and the execution time.

thrinn

Maybe set up both variants in parallel (few more complex rules vs. more different but easier rules) but make sure to always deactivate one set. Then you can test both approaches, checking against the logged run time as Mario said. I assume that deactivated rules do not have any performance impact.

I use different rules myself and did not experience any performance issues. But my database is small (< 30.000 files), and my computer quite new, so your mileage may vary.

Just as a side note: I find it difficult to "read" RegExp without trying, but wouldn't your simplification rule 4 make a JPG file a version of itself?
Thorsten
Win 10 / 64, IMatch 2018, IMA

Mario

Quoterule 4 make a JPG file a version of itself?

Looks like it.

bekesizl

Thank you Thorsten, it is a good idea with testing multiple rule sets for performance.

Regarding rule 4 it was a quick display of another rule structure, without test.
Although a JPG version of a JPG is OK, like editing a JPG in a lossless editor and exporting it with a string appended to the filename.
But I should probably change this one, so that the name has to be different (some string appended) and cannot stay the original. But these rules operate in the "Master folder", so this is taken care by the filesystem anyway.

Carlo Didier

If you don't notice a performance degradation, I would go for seperate rules.
Those would be simpler to understand, debug and maintain.

bekesizl

I ended up with following rules in following order.

It took some time to redo all exisiting relations on my database (over 70.000 files), but for adding some new files processing time is alright.

DxO Sidecar (buddy)
\.(cr3|cr2|nef|tif|tiff|jpg|jpeg|dng|nef|rw2|raf|srw|arw)$
/^_*//
^{name}{ext}\.dop$


ON1 sidecar (buddy)
\.(cr3|cr2|nef|tif|tiff|jpg|jpeg|dng|nef|rw2|raf|srw|arw)$
/^_*//
^{name}\.on1$


Mylio XMP files (buddy) - Workaround for application compatibility
\.(tif|tiff|jpg|jpeg|dng)$
/^_*//
^{name}\.xmp$


JPG (buddy+version)
\.(jpg|jpeg)$
/^_*//
^{name}[+\-_]+.*\.(jpg|jpeg|dng|tif|tiff|afphoto|psd|heic)$


RAW (buddy+version)
\.(cr3|cr2|nef|rw2|raf|srw|arw)$
/^_*//
^(_*{name})[+\-_]*[0-9|a-z]*\.(jpg|jpeg|dng|tif|tiff|afphoto|psd|heic)$

Mario

If this is what you really need...

Be careful with metadata produced by Mylio. The XMP records I have seen are pretty basic and only contain the small subset of XMP fields Mylio knows about.