Reg Ex for ignoring folders

Started by Rene Toepfer, March 03, 2021, 05:40:12 PM

Previous topic - Next topic

Rene Toepfer

I have a more or less constant folder structure which will only be added by a time stamp, as shown on attached screen grab 2021-03-03_17-14-41.png.
How can IM ignore folders (at an already indexed structure) containing unbearbeitet and instagram? If I check [1] the regex shall be unbearbeitet|instagram to ignore all folders containing these strings (like example beach|sun). It is entered in Bearbeiten|Einstellungen|Indizierung (attachement 2021-03-03_17-14-41.png).
Then I did several rescans, standard and forced, but the folders will still not be ignored.
I have also tried regexes like
\unbearbeitet|\instagram
\unbearbeitet;\instagram
\\unbearbeitet;\\instagram
*unbearbeitet*;*instagram*
but all of them did not work as well.
A search in the forum using "regular expression ignore" has not shown a hint what I am doing wrong.
Where is my mistake and how can I fix it? Thanks for your support!

[1] https://www.photools.com/help/imatch/#rmh_regexp.htm?dl=h-17

Mario

Just use

instagram$;unbearbeitet$


The sample folder names you show don't start with \unbearbeitet but contain unbearbeitet somewhere in the folder name.
The two regexp below exclude all folders ending with unbearbeitet or instagram. This should do the trick just nicely.

Note that changing the regexp after a folder has added will not remove the folder. You need to use the "Remove folder from database" in that case. DO NOT confuse with Delete folder.

thrinn

The indexing Options are only for adding new folders to the database. If unbearbeitet and instagram are already indexed, they will stay indexed. But it is easy to remove these unwanted (sub) folders: Just use the Folder Filter at the bottom of the Media & Folders view to restrict the folders to e.g. unbearbeitet. Select the folders you don not want in your database and use the context menu Remove from database command.

Regarding your RegExp:
The RegExp with a single backslash will not work because the backslash has a special meaning in regular expressions. Same for the RegExp with asterisks.
unbearbeitet;instagram shoud work. Maybe use \\unbearbeitet;\\instagram instead to catch only folders starting with these strings.

Thorsten
Win 10 / 64, IMatch 2018, IMA

Rene Toepfer

Thank you to both of you! Your hints were very helpfully to me and it worked well. Ignoring while indexing works also as expected afterwards your hints.

sinus

Quote from: Rene Toepfer on March 03, 2021, 08:00:29 PM
Thank you to both of you! Your hints were very helpfully to me and it worked well. Ignoring while indexing works also as expected afterwards your hints.

Great.
Allow me to add a remark:
I would avoid in foldernames and filenames spaces and umlauts.
You can have no problems for years, but usually the day will come, that exactly this gives you problems.

Although it looks sometimes not that nice (I guess, your second name has originally also a umlaut), it will avoid troubles in the future.

Just my 2cents for something unrelated.
Best wishes from Switzerland! :-)
Markus

Rene Toepfer

Quote from: sinus on March 04, 2021, 08:09:17 AM
Although it looks sometimes not that nice (I guess, your second name has originally also a umlaut), it will avoid troubles in the future.


Yes, I have an Ö in my surname - also an É in my forename.
Do you think so w/r to the characters? I was of the opinion that umlauts and special characters are with UTF no issue anymore. I am especially not saying that you are wrong with your warning.

sinus

I can only tell from my experience.
Theoretically there should be no more problems.  8)

But in practice I heard and read here and there about problems.
Maybe people uses a program, what does it not correct and have troubles.
Or searches works not and so on.

And specialy online, sending files, and so on will lead often in problems.
If you search for "umlaut in filenames", you will find a lot of problems.
Like here, as a example, from 2017:
"I am using the v2 API for http. What I found is that if a file or folder name contains a special character, like a German Umlaut like ä, I get a 400 error on upload and download. Is this a bug?"

It was only a remark for you, not from an expert, but from a user, what has heard and read about problems.  8)

Inside IMatch, I guess, you should not have problems. Although I have heard from problems with regex and umlauts .... but I cannot really say.

But you can be lucky and you will have never problems. I hope so of course for you.

Best wishes from Switzerland! :-)
Markus

Mario

I agree with Markus.
Especially if you work cross-platform (operating systems), have to deal with clients in multiple countries or work with different online services - non-ASCII characters in file names, blanks, special characters which are allowed in Windows but not in UNIX etc. may cause problems. UTF8 is not a thing everywhere, and there are cool applications out there which choke on file names with blanks or German umlauts!

Most press houses, image agencies and the standards used by libraries and archives thus require or recommend 'simple' ASCII-based file names. Usally the only allowed special character in file names is the hyphen -
IMatch is fully UTF-8 / 32-bit Unicode for a long time. But, for example, the renowned ExifTool has added support for non-ASCII file names only about two years ago...

File names in plain ASCII, with a date-code, maybe client id / project / photographer id and a unique serial number are usually best. All other information better goes into metadata.
The Renamer in IMatch makes it very easy to establish a consistent and portable naming schema. It can also be used to correct existing file names if needed.

Rene Toepfer

#8
Thanks to your detailed explanations. Usually I do not use neither blanks nor special characters in file names but umlauts. Based on your experiences I have to review my workflow.


Edit:
My file names are created as follows: $project_$typeofdoc_$counter_$revision_$version
$project: Project number
$typeofdoc: Place holder for booklet, poster, portrait etc. only a few keywords are allowed. I was using numbers (booklet=10, poster=11 etc.) for a while but clients prefered a clear name instead of number.
$counter: To handle documents having different content, e.g. Poster showing cats (11_001), Poster showing dogs (11_002)
$revision: I think should be clear
$version: I can have some intermediate versions of a document, to discuss with client if e.g. red text colour (would become version 01) is better than blue text colour (would become version 02)
The merged numbers are resulting in the document number: 4711_11_002_02_03