Version name delimitation

Started by HansEverts, January 04, 2014, 08:00:17 PM

Previous topic - Next topic

HansEverts

I looked for my question in the Help file, but unless I take a course, regular expressions, except for the most basic ones, are still a mystery for me. I try and think I master a couple of things, but when I try again a month later, I have to start all over again.

My question is the following: the name structure of my files is YYYYMM ####. When I edit I add something to the basic structure, like YYYYMM #### panorama. But then, this file is no longer recognized as version of the first one, when that is based on names.

I would need an expression that focuses on the first 11 characters of the file names. If the first 11 characters are the same, there is a master-version relationship.

Is that possible?

Thanks


joel23

Quote from: HansEverts on January 04, 2014, 08:00:17 PM
I looked for my question in the Help file, but unless I take a course, regular expressions, except for the most basic ones, are still a mystery for me. I try and think I master a couple of things, but when I try again a month later, I have to start all over again.

My question is the following: the name structure of my files is YYYYMM ####. When I edit I add something to the basic structure, like YYYYMM #### panorama. But then, this file is no longer recognized as version of the first one, when that is based on names.

I would need an expression that focuses on the first 11 characters of the file names. If the first 11 characters are the same, there is a master-version relationship.

Is that possible?

Thanks
Problem seems to be the blanks you use.
Just a quick shot, this should do: ^(_*{name})[+\- _]*[0-9|a-z]*\.(PSD)$
Mind the blank after +\- and the file extension when copying.  Not tested for side effects ;-)
regards,
Joerg

HansEverts

Thanks Jorg, it seems to work. I had a master and version called "201310 0001.NEF/JPG". I renamed the version into "2013 0001_test.JPG" and after some trial and error the file was detected and included in the relationship. I have not searched for side effects yet.

I will see f I can understand why it works so that next time I can do it myself.

Thanks and best regards

Mario

#3
What you do is a pretty standard naming schema. Your master file name is something like

20140101.RAW and your versions are named like
20140101.TIF
20140101_web.JPG
20140101_test.JPG
20140101_needs_some_work_0129123.PSD

you can use a very simple regular expression to catch them all:

{name}.*\.(jpg|jpg|psd|tif)$

{name} resolves to the file name (without extension) of the master.
.* means "any number of any character" and covers whatever follows the file name
\. means dot .
(jpg|jpg|psd|tif)$ means "ends with one of these extensions"

I'm no regular expression master myself. But the simple things are quite easy, and that's all we need in IMatch most of the time. Unless you have a very weird naming schema or you need to be very restrictive.

HansEverts

With that one I have no problem, but ^(_*{name})[+\- _]*[0-9|a-z]*\.(PSD)$ is a different story.

I wanted an expression based on file name, but which only looks at the first 11 characters of the file name, leaving me the freedom to add certain text. The expression above seems to do the job, but I admit I couldn't explain why.

Mario

^(_*{name})[+\- _]*[0-9|a-z]*\.(PSD)$ is a quite restrictive expression. For example, it does not allow for blanks (spaces) in the file name.

Using your example file names,

Version file names like

20140101abcd_web.jpg will match, but
20140101abcd web.jpg will not.

Note that the second file name uses a blank before the web! This is not covered by your expression. You only allow 0-9 and a-z but not blanks!

HansEverts

I have been testing a bit.

201410 0001.NEF matches with 201410 0001_test.JPG, but also with 201410 0001_test.JPG
201410 0001 test.NEF matches with 201410 0001 test.JPG
201410 0001 tested.NEF does not match with 201410 0001 test.JPG
201410 0001 test.NEF matches with 201410 0001 tested.JPG
201410 0001 hans.NEF does not match with 201410 0001 test.JPG

This means that the expression ^(_*{name})[+\- _]*[0-9|a-z]*\.(PSD)$ does not meet my requirements, because I want the text behind the #### to be free. In this case it seems the text in the NEF file has to match the text in the JPG, although the latter can exceed the former.

I want it like this:

201410 0001 piet.NEF should match 201410 0001 hans.jpg, but
201410 0002 piet.NEF should not match 201410 0001 hans.jpg

The first 11 characters should match.

herman

Before digging very deep in regex, have you tried to use "Exif Timestamp" as the link in stead of file-naming wizardry?
If that works (there was a recent bugfix in this area!) that may be all you need.....

Enjoy!

Herman.

joel23

Quote from: HansEverts on January 05, 2014, 09:55:31 PM
I have been testing a bit.

201410 0001.NEF matches with 201410 0001_test.JPG, but also with 201410 0001_test.JPG
201410 0001 test.NEF matches with 201410 0001 test.JPG
201410 0001 tested.NEF does not match with 201410 0001 test.JPG

201410 0001 test.NEF matches with 201410 0001 tested.JPG
201410 0001 hans.NEF does not match with 201410 0001 test.JPG
Of course the last one does not match, because the main filename is different. When I suggest this string I concentrated on your blanks, not to what is behind the 11th character.
QuoteThis means that the expression ^(_*{name})[+\- _]*[0-9|a-z]*\.(PSD)$ does not meet my requirements, because I want the text behind the #### to be free. In this case it seems the text in the NEF file has to match the text in the JPG, although the latter can exceed the former.

I want it like this:

201410 0001 piet.NEF should match 201410 0001 hans.jpg, but
201410 0002 piet.NEF should not match 201410 0001 hans.jpg

The first 11 characters should match.
I understand what you want. But this is not easy to achieve.

By the strings Mario and I suggested
"201410 0001 tested.NEF" would match with "201410 0001 tested-test.JPG" or "201410 0001 tested-WHATEVER.JPG"
"201410 0001 hans.NEF" would match with "201410 0001 hans-test.JPG" or "201410 0001 hans-WHATEVER.JPG"
"201410 0001 piet.NEF" would match with "201410 0001 piet-WHATEVER.JPG"

"201410 0001.NEF" would match with "201410 0001-WHATEVER.JPG" if the other settings fit.

"Regular Expressions" are a rather widely concept (depends on the OS or application) should be able to achieve what you want, but I am not sure if IM does fully support them.
regards,
Joerg

HansEverts

Herman, thanks for the suggestion. I agree that sounds more straightforward, but I wonder what the pitfalls are, if any.

Jorg,
QuoteBy the strings Mario and I suggested

I am not sure what you mean.
You suggested ^(_*{name})[+\- _]*[0-9|a-z]*\.(PSD)$ (with JPG|TIF instead of PSD) in your first response. That is the one I was testing and where for example

201410 0001 tested.NEF does not match with 201410 0001 test.JPG

Mario mentions {name}.*\.(jpg|jpg|psd|tif)$, but I am not sure that was meant as a proposal. I just tried it with the following result:

201410 0001 test.NEF does not match with 201410 0001.JPG

Unless I am doing something wrong, both expressions do not cover my requirement in that the first 11 characters should be determinant for matching or not, whatever I add after them.

Best regards

Mario

Quote from: joel23 on January 05, 2014, 11:38:22 PM
but I am not sure if IM does fully support them.
IMatch fully supports Perl style regular expressions based on the reference implementation in the Boost library.

Mario

Quote from: HansEverts on January 06, 2014, 06:54:22 AM
201410 0001 tested.NEF does not match with 201410 0001 test.JPG
Mario mentions {name}.*\.(jpg|jpg|psd|tif)$, but I am not sure that was meant as a proposal. I just tried it with the following result:
201410 0001 test.NEF does not match with 201410 0001.JPG
That's to be expected.

Using your example file names:

201410 0001 test.NEF
201410 0001.JPG

and my sample expression:

{name}.*\.(jpg|jpg|psd|tif)$

When IMatch tries to find a match, it first replaces {name} with the name of the current master file it tests, so the expression becomes:

201410 0001 test.*\.(jpg|jpg|psd|tif)$

This will find all files starting with "201410 0001 test" and hence it will never find your version file starting with only "201410 0001".

Usually users just append some characters to the master file name when they create version file names, e.g.

201410 0001.NEF -> 201410 0001 web.JPG -> 201410 0001-001.TIF  or something. For such derivative file names, my expression works.

But you use master file names which differ significantly from the version file names ("201410 0001 test" to "201410 0001") you will be in trouble. This cannot be handled with regular expressions. The {name} token is always replaced with the complete file name of the master file, there is no option to say "use only the first 11 characters of the name". This would be a feature request.

sinus

Hi Hans

I am not sure, how to resolve your problem.
I guess, the feature request from Ben would solve your problem:

https://www.photools.com/community/index.php?topic=1455.0

Generally, I would thinking about using a better naming-system.
For example, using blanks can be used, but should be better not used.

And if you use IMatch or not, if you want this:

I want it like this:

201410 0001 piet.NEF should match 201410 0001 hans.jpg, but
201410 0002 piet.NEF should not match 201410 0001 hans.jpg

The first 11 characters should match.


If you would use ONE character, what is not in use anywhere in your filename, then you could use this character as a separater. All, what comes after this character, would be a version, because you could tell this IMatch. If you would use for example an _ (underscore) as such a separator, all your problems would be solved.

201410 0001_piet.NEF WOULD match 201410 0001_hans.jpg, but
201410 0002_piet.NEF WOULD NOT match 201410 0001_hans.jpg


Best wishes from Switzerland! :-)
Markus

sinus

Why not use a "safe" filenaming-system like this one (without blanks and with a delimiter) _


201410-0001_piet.NEF
versions:

201410-0001_piet-a.jpg
201410-0001_piet-und-hans.jpg
201410-0001_hans.jpg


201410-0002_piet.NEF
versions:

201410-0002_piet-a.jpg
201410-0002_piet-und-hans.jpg
201410-0002_hans-oder-judith.jpg

or even better:

master:
201410-0001-piet.NEF

versions:
201410-0001-piet_holger.jpg
201410-0001-piet_hans.jpg

master:
201410-0002-piet.NEF

versions:
201410-0002-piet_holger.jpg
201410-0002-piet_hans.jpg

(if there is not _ in a filename, there is NO version.
If there is a _ in a filename, that is a version, and the master BEFORE that _ has always the same characters before the underscore.

I guess, most of users does use a system like this.
Best wishes from Switzerland! :-)
Markus

herman

Quote from: HansEverts on January 06, 2014, 06:54:22 AM
Herman, thanks for the suggestion. I agree that sounds more straightforward, but I wonder what the pitfalls are, if any.

Depends on your images and the software you use to derive versions.

IF all masters contain an Exif time stamp AND the software you use to derive versions leaves the Exif time stamp intact THEN this should work ELSE use File Name and regex ;)

I have been up to my eyebrows in regex, trying to brew something that meets your requirement.
So far no luck  :'(
Enjoy!

Herman.

BenAW

Quote from: herman on January 06, 2014, 11:39:52 AM
Depends on your images and the software you use to derive versions.

IF all masters contain an Exif time stamp AND the software you use to derive versions leaves the Exif time stamp intact THEN this should work ELSE use File Name and regex ;)

I have been up to my eyebrows in regex, trying to brew something that meets your requirement.
So far no luck  :'(
I'm using the EXIF date time without any problems  :)

Not into regex much, but using .{18} in the RegexCoach seems to give the result I would want if using the first 18 characters of a filename.

[attachment deleted by admin]

Mario

Repeaters like .{18} cannot be used so "split" the {name} tag. The {name} tag is an IMatch extra which represents the name of the master.

herman

Quote from: BenAW on January 06, 2014, 01:30:33 PMI'm using the EXIF date time without any problems  :)
So am I.
Almost  8)
I have a handful of files without Exif time stamp, I have to version them manually.
And, when I configure the IMatch batch processor better, next time these exported files will have an Exif time stamp  ;)

As far as regex is concerned: Hans has a pretty complex naming structure.
When I understand things correctly, if this is the master file

201410 0001 piet.NEF

Only the red part is to be used for linking, the rest is to be ignored.
It seems like we have to slice a part of the master file name to have something to build a link on.

I thought that this might do the trick, but it does not, don't know why yet....

Master file expression: \.(nef)$
Replacement expression: nothing
Link Expression: substr({name},0,10).*\.(jpg)$

When Mario is reading this: are substrings implemented in IMatch?
When they are I am doing something wrong here.....



Enjoy!

Herman.

BenAW

Quote from: herman on January 06, 2014, 02:23:33 PM
Master file expression: \.(nef)$
Replacement expression: nothing
Link Expression: substr({name},0,10).*\.(jpg)$

In the VarToy app this is working: {File.Name|substr:0,18}



[attachment deleted by admin]

herman

The expression tester of the File Relations window returns a  "No Match".
I don't see why.
I have seen it giving a warning when an expression is invalid, but it does not here.
Maybe someone better educated in regex can have a look at this?

[attachment deleted by admin]
Enjoy!

Herman.

Mario

You cannot use variables here. Only the documented {name} {ext} and {filename} are supported.

herman

Thanks Mario.
Seems fair enough.

@Hans Everts:
Apparently what you require can not be done now.
My advice would be:
- First of all: try Exif Timestamp as a linking mechanism, see if it picks up all your versions.
- If that does not work out: review and change your file naming to something that can be handled with the current IMatch capabilities.
- If that is not possible: file a feature request.

Most of the time IMatch can be configured to fit a workflow.
Occasionally one may have to adjust the workflow so that it can be supported by IMatch....

Hope this helps.
Enjoy!

Herman.

HansEverts

Thank you all for the comments, this is very useful.

I don't think my naming structure is overly complicated: YYYYMM ####
I think my initial request to be able to add text to a version was reasonable, but what was unreasonable was that the master should also be modifiable. That does not make sense and defeats the purpose of the whole exercise.

So for me there are 2 options: 1) I use Mario's expression {name}.*\.(jpg|jpg|psd|tif)$, which does what I need or 2) I switch to EXIF time stamps. At this point I go for the first option, but I will certainly try the second. A disadvantage of the time stamp is that I have a few thousand files which are scans from slides. They have the time stamp of the scan.

Thanks for your help.

Mario

The matching system implemented in IMatch 5 is very powerful and flexible.
It failed for one situation: When the user uses shorter names for the version that for the master.
Example:

20140101orignal.raw
20140101web.jpg

This cannot be matched because {name} always returns the entire master name ('20140101-orignal').
I can imagine several workflow scenarios where this may cause problems.

I hence made a small change in the matching. A user can now append a postfix in the form :n to the {name},{ext} and {filename} patterns. n is a number > 0 which specifies how many characters of the left of the original file name should be used to replace the pattern. For example:

{name}:8

means: Replace {name} with the first 8 characters of the original file name.

The mask

^(_*{name}:8)[+\-_]*[0-9|a-z]*\.(jpg|tif)$

becomes

^(_*20140101)[+\-_]*[0-9|a-z]*\.(jpg|tif)$

before the regular expression is interpreted. This now allows us to handle version file names which are shorter than the master file name but share a common prefix. I'm sure there will be some users with weirder naming schemes which would require "substring of master file name". We'll look into this when it comes up.

BenAW

Quote from: Mario on January 07, 2014, 09:06:58 AM
I hence made a small change in the matching. A user can now append a postfix in the form :n to the {name},{ext} and {filename} patterns. n is a number > 0 which specifies how many characters of the left of the original file name should be used to replace the pattern.
This full fills this feature request: https://www.photools.com/community/index.php?topic=1455.0

Imo you can move it to the solved section  ;D

Mario

I knew there was a fr for this, bug could not find it. Thanks for the pointer.

HansEverts

I spent half an hour testing ^(_*{name}:11)[+\-_]*[0-9|a-z]*\.(jpg|tif)$ until I realized that in regular expressions the characters 11 do not mean the numeric eleven, but simply a one and another one, referring therefore to the first character rather than the 11th.

I will stick to the expression without :n, until I really need it.

herman

Quote from: HansEverts on January 07, 2014, 08:43:19 PM
I spent half an hour testing ^(_*{name}:11)[+\-_]*[0-9|a-z]*\.(jpg|tif)$ until
I am afraid the :11 means nothing yet.
Hopefully it will mean something you can use in the upcoming release 134.

Enjoy!

Herman.

Mario

The :n postix has been added for the next build. I cannot make remote changes to IMatch installed on your system

HansEverts