Need Regular Expression that matches filenames the end in Roman numerals

Started by DavidOfMA, November 22, 2014, 03:46:00 AM

Previous topic - Next topic

DavidOfMA

I am trying to come up with a versioning Regular Expression so that I can match masters that end with Roman numerals to versions that end with the same Roman numeral. The problem is that I am also matching, as versions, files that end with Roman numerals that include the Roman numeral of the master. For instance, I want to match "Beach Rose I.jpg" to other versions of "Beach Rose I.jpg," such as "Beach Rose I bw edited.jpg," but NOT to "Beach Rose II.jpg," "Beach Rose III.jpg," or "Beach Rose IV.jpg". (I can't change the naming scheme because many of these files are linked into book projects and websites using the current names.)

I have been trying to create a Regular Expression that says "Match {name} followed by any characters except the uppercase characters [IVXCM]" but I apparently don't understand the RegEx syntax sufficiently, as each attempt I've made has failed, and poring over RegEx tutorials hasn't yielded anything so far.

I'm sure I'm missing something obvious. Anyone here have a suggested RegEx string? I would think something like this would work: ^(_*{name})^[IVXCM].*\.(jpg|jpeg|tif|psd|png)$

but that doesn't match any of the files. Any help appreciated.

Thanks,
David

thrinn

Hi David,
do your version file names always contain a space after the original file name? Then you could try:

^(_*{name}) .*\.(jpg|jpeg|tif|psd|png)$

Note the space after the ^(_*{name}) part.
This would match all files starting with the original file name, optionally prefixes by any number of _, followed by a space, followed by any number of characters.

Regards,
Thorsten
Thorsten
Win 10 / 64, IMatch 2018, IMA

ubacher

Here is what I would suggest:
After the file name you want to have at least one character other than [IVCLM] followed by any number of  other characters  .*

To specify this try {name}[0-9|a-z|ABCEFGHJKNOPQRSTUWXYZ]{1}.*

You will also want to add special characters like dash, underscore, blank etc. which can follow the roman numeral

DavidOfMA

@ubacher
Thanks.This seems to be heading in the right direction -- except that many of the versions have the same filename as the original file. For instance, the CMYK version of "Beach Rose II.jpg" is also called "Beach Rose II.jpg," but it is in a different folder. I need to also match those files.

That's why I was trying to come up with a "NOT" type expression. What I need is "name followed by ZERO OR MORE characters that are not IXVCM". Using your method, is there, instead, a way to specify "name followed by ZERO OR MORE of the following characters"? It's the ZERO OR MORE part that confuses me. If I substitute {0} where you have {1}, I again match everything that begins with "name," which I don't want to do.

I don't know why this seems so confusing to me. Maybe it's the postfix-style notation, but I can't quite rap my mind around it, an each attempt I have made to do this fails either by not matching "Beach Rose I" to "Beach Rose I" or by also matching it to "Beach Rose II."

@thorsten
The names of the original file don't have spaces after the filename. "Beach Rose I" is the name of one master file, "Beach Rose II" is the name of another master file, etc. The master is not "Beach Rose."

DavidOfMA

#4
After a couple of hours of research and experimenting, I came up with this string, which is pretty close to what I'm looking for. I just need one more tweak. This is what I've constructed so far:

^(_*{name})((?!I)(?!V)(?!X).)*\.(jpg)$

This will match "Beach Rose I" to "Beach Rose I_bw" and "Beach Rose I web" but not to "Beach Rose II" or "Beach Rose IV"

Unfortunately, it won't match "Beach Rose I intermediate" or "Beach Rose I 600x600" because these contain the letter "i" or "x". If I could tell RegEx to restrict the non-matching to only the UPPERCASE letters I, V, and X, I'd have it.  For reasons I don't understand, the regular expression engine seems to be case-insensitive. Does anyone know how to specify that it match only UPPERCASE?

Thanks,
David

jch2103

I'm too rusty with regular expressions to give you any real help, but I think the solution to your issue will involve using square brackets in the regex expression. These are used when you want to match one of a specific range of characters, e.g., [IVX].

Adding the negation character ^ will "match any single character that is not in character_group. By default, characters in character_group are case-sensitive." See http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx for more information. In your case, this will probably include e.g., [^IVX].

Good luck!

John

DavidOfMA

Thanks. That seems to get me closer than anything else I've tried. I thought I'd tried that earlier, but I must have gotten the syntax wrong somehow. I'll test it on the database. What it misses I can probably turn into manual versions.

EDIT: No, turns out this syntax has the same problem as my earlier attempt. It won't match versions that contain lowercase i,e,x anywhere in the filename. Apparently the REGEX engine IMatch uses is not case-sensitive or there is some other issue that prevents it from distinguishing between "i" and "I". Mario, if you see this, can you explain why [^IVX] matches i, v, and x?

Thanks,
David