Regular expression search not returning correct result

Started by ubacher, November 01, 2017, 01:14:27 PM

Previous topic - Next topic

ubacher

When I search for the regular expression (using the search bar)
.*_1\..* I correctly find files with _1 at the end.

If I try this via a script it finds (wrongly) also the file:2017-10-29 094_1--095.jpg
It does not matter how I set advanced mode.
IMWS.get('v1/search/filename', {
                    scope: 'idlist',
                    idlist: IMatch.idlist.fileWindowFilesTotal,
                    pattern: '.*_1\..*',
                    advancedmode: false,
                    fields: 'id,filename,name,namene,ext'
                }).then(function (response) {


Am I doing something wrong or is this a bug?

Carlo Didier

Tested in RegexBuddy and it looks like your expression is correct.

sinus

Just to my understanding:

I had always troubles to use regex expressions. Because they are finally complex and we have very quickly a wrong expression.
Why does people using regex?

Are they quicker? I doubt, but I do not know.

You could in this case simply search without regex with

_1.

I tried it and even with my 250'000 files I could not measure a difference, both expression found 5305 files out in, puh, 1 seconde.

Don't get me wrong, I am only curious.
Luckily I have a good naming-convention, hence I can find some files easy without regex. Maybe with regex we can construct searches, what would not be possible without regex?
Best wishes from Switzerland! :-)
Markus

thrinn

I can not try it right now, but in JS strings backslashes have to be doubled, don't they?
Thorsten
Win 10 / 64, IMatch 2018, IMA

ubacher

Doubling the \ did it!  Silly, I should have figured that out myself.
Thanks thrinn!

thrinn

Quote from: sinus on November 02, 2017, 09:23:28 PM
Why does people using regex?

Are they quicker? I doubt, but I do not know.
Regex are much more complex than a simple text search, so I would suspect that they in fact slower than the simple search. This does not mean that you would notice any difference in practise, mind.
But a simple text search is quite limited. For example, searching for _1. will find all files that have this combination of characters anywhere in the file name, not only those that end with them.
Another example: I use IMatch also to catalog digital documents. They all start with a letter, followed by 0 to 2 digits or letters, followed by a dash, followed by the year, followed by a dash again, followed by a 2 digit number, followed by an arbitrary string. So, A-2017-01 A document, B1-2016-11 Copy of an invoice, V01-2017-22 Another document are all valid filenames in this context, but Copy of A-2017-01 is not. With a Regex, I can grab all of them at once, e.g for a filter. This is not possible with a simple text search.
Thorsten
Win 10 / 64, IMatch 2018, IMA

sinus

Quote from: thrinn on November 02, 2017, 09:55:56 PM
Quote from: sinus on November 02, 2017, 09:23:28 PM
Why does people using regex?

Are they quicker? I doubt, but I do not know.
Regex are much more complex than a simple text search, so I would suspect that they in fact slower than the simple search. This does not mean that you would notice any difference in practise, mind.
But a simple text search is quite limited. For example, searching for _1. will find all files that have this combination of characters anywhere in the file name, not only those that end with them.
Another example: I use IMatch also to catalog digital documents. They all start with a letter, followed by 0 to 2 digits or letters, followed by a dash, followed by the year, followed by a dash again, followed by a 2 digit number, followed by an arbitrary string. So, A-2017-01 A document, B1-2016-11 Copy of an invoice, V01-2017-22 Another document are all valid filenames in this context, but Copy of A-2017-01 is not. With a Regex, I can grab all of them at once, e.g for a filter. This is not possible with a simple text search.

Thanks, Thorsten, for this detailed answer.
Interesting.
Regex are much more complex ... yes, that is true, that is why they are so mighty, but difficult to find the correct expression.

Like your document-file-names are structured, my filenames are also, that is why I must almost never use regex, I guess. For example I use of course only one point in my filenames.  :D
Or abbreviations like _f (freigestellt) or _v (version) is only once in each filename, hence I can simply type _f into the search bar and IMatch will find all files with this meaning.

Also in my documents (not images) I use restrictives filenames. The only "free text" in my filenames is a short description like "matterhorn" or "martha-at-sea", that is why my filenames are quite long.
And on the other hand that is why I user mostly in my search only the filename-search, because mostly IMatch can find it and it is incredible fast.

Best wishes from Switzerland! :-)
Markus

Carlo Didier

My filenames are also very strictly structured, but still sometimes there are situations where only a regex would do. Like selecting all images named D20170523001* to D20170525143*.
But I gave up on using regexes for calculated categories because althoug simple formulas qre quite fast, as soon as they become a little bit complex, they become extremely slow sometimes.

That's why I use an (event driven! . IM5.x!) script to automatically check the filenames of new files and add them to the corresponding category.
I would in the example above put the text "D20170523001-D20170525143" into the description of the category so the script knows which images to assign to it. It would put all images taken between 23-may-2017 and 25-may-2017 (up to image 143 of that day) into the category for a short trip I made at these dates.

sinus

Quote from: Carlo Didier on November 05, 2017, 11:56:52 AM
My filenames are also very strictly structured, but still sometimes there are situations where only a regex would do. Like selecting all images named D20170523001* to D20170525143*.
But I gave up on using regexes for calculated categories because althoug simple formulas qre quite fast, as soon as they become a little bit complex, they become extremely slow sometimes.

That's why I use an (event driven! . IM5.x!) script to automatically check the filenames of new files and add them to the corresponding category.
I would in the example above put the text "D20170523001-D20170525143" into the description of the category so the script knows which images to assign to it. It would put all images taken between 23-may-2017 and 25-may-2017 (up to image 143 of that day) into the category for a short trip I made at these dates.

Fine, of course.
You could also go to the timeline, in this case.
Best wishes from Switzerland! :-)
Markus

Carlo Didier

Quote from: sinus on November 07, 2017, 08:40:35 AMFine, of course.
You could also go to the timeline, in this case.

I actually never use the timeline.
First because when I have to lookup something by date/time I already have that in my folder structure. And the timeline may list files based on date created or modified but not on the real date to which it belongs.
Second because for events I have categories, so I don't have to know exact dates. See attachment.

sinus

IMatch give us several possibilities!  ;D

Good for us users!
Best wishes from Switzerland! :-)
Markus

jch2103

Quote from: Carlo Didier on November 08, 2017, 02:01:00 PM
And the timeline may list files based on date created or modified but not on the real date to which it belongs.

As far as I know, the timeline sorts files by date created, unless date created isn't available, in which case it will fall back to other available data such as date modified. For example, I'm able to have my scanned files properly sorted after I assign a proper date created.
John

Mario

For images: Date created, date digitized, date last modified.
You can position a file along the time line by filling out the date created / digitized in the Metadata Panel.
Usually your camera does that. For scanned files this needs to be done manually or via a Metadata Template.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Carlo Didier

That's all fine, but: When I (re-)edit a raw image, say from 2015, and create a new TIFF or JPG from it, then none of the timestamps corresponds to the date/time the original image was taken.
With the date in the filename, like D20150911022.DNG, the new file would be something like D20150911022.jpg or D20150911022_a.tiff. The original date is still there and the file is saved to the same folder.
And with my automatic (IM5 - event driven!) script it is also automatically assigned to the corresponding event category where applicable.

Just to explain why the timeline doesn't work for me and why I don't need it. But that's just my personal workflow. For others it may well work just fine.

sinus

I personally do use the timeline very seldom.
I do not need it for searching or displaying, because I can do all in the other views.

If there would be another kind of viewing the images, then I would use it more
(e.g. a kind of a line with years, monthes and then images below and above it, you have seen such timelines for sure).
Best wishes from Switzerland! :-)
Markus

jch2103

#15
Quote from: Carlo Didier on November 08, 2017, 10:31:15 PM
That's all fine, but: When I (re-)edit a raw image, say from 2015, and create a new TIFF or JPG from it, then none of the timestamps corresponds to the date/time the original image was taken.

Just curious - What software do you use to convert raw images? For me, DxO PhotoLab/Pro, Affinity, Photoshop (and if I recall correctly, Lightroom) all save output files (jpg and tiff) with the original metadata date/times.* Hence, for me, the Timeline works correctly.



(* Affinity doesn't save some other metadata such as Country, State, etc.)
John

Jingo

I was thinking the same thing.. I always check to ensure the original date/time of the exported image matches the original file so I can use that to determine when the image and modified file were originally taken.  Haven't run into a situation where the exported file didn't still have the original date/time in the exif.

Mario

QuoteThat's all fine, but: When I (re-)edit a raw image, say from 2015, and create a new TIFF or JPG from it, then none of the timestamps corresponds to the date/time the original image was taken.

As the other posters wrote, this is a workflow problem. Creating a derivative image (version) should retain the original create/digitize date and time. Which means, usually, that original file and all derived images fall into the same spot in the time line.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Carlo Didier

Quote from: Mario on November 09, 2017, 12:03:47 AMAs the other posters wrote, this is a workflow problem. Creating a derivative image (version) should retain the original create/digitize date and time. Which means, usually, that original file and all derived images fall into the same spot in the time line.

The problem here is the "should" ... there's no guarantee. Derivatives may come from the raw converter (ACR), Photoshop, PTGui, whatever. With my workflow I don't have to care and check for each image or application whether the required metadata is correctly maintained. It just doesn't matter.

sinus

Quote from: Carlo Didier on November 09, 2017, 08:24:33 AM
Quote from: Mario on November 09, 2017, 12:03:47 AMAs the other posters wrote, this is a workflow problem. Creating a derivative image (version) should retain the original create/digitize date and time. Which means, usually, that original file and all derived images fall into the same spot in the time line.

The problem here is the "should" ... there's no guarantee. Derivatives may come from the raw converter (ACR), Photoshop, PTGui, whatever. With my workflow I don't have to care and check for each image or application whether the required metadata is correctly maintained. It just doesn't matter.

I checked this here also: Masters and versions has the same date, hence they are in the same place in the timeline.

Carlo, if your workflow works, fine!

Maybe you could only think at this, that workflows sometimes changes. Because programs changes or OS or whatever.
But I have not doubt, that you will manage this, also if something changes in the future.  :D
Best wishes from Switzerland! :-)
Markus

Carlo Didier

Quote from: sinus on November 09, 2017, 08:42:51 AM
Maybe you could only think at this, that workflows sometimes changes. Because programs changes or OS or whatever.
But I have not doubt, that you will manage this, also if something changes in the future.  :D

Absolutely. But changes in workflows are a big thing, so it's best to be able to avoid it, especially if the change won't bring any significant benefits to compensate for the money and time to invest in the change.