Regular Expressions

Regular expressions are a powerful way to specify search terms and to match text. IMatch uses regular expressions in many of its features so having a basic understanding of regular expressions is very helpful. All you need to know is right here.

IMatch uses regular expressions based on the syntax used by the Perl language.
A good reference with many examples is https://www.regular-expressions.info/examples.html. On https://regex101.com/ you can try out regular expressions and get direct feedback.

This introduction concentrates on the everyday regular expression stuff. If you want in-depth information about all the various regular expression elements, follow the links above.

What is a Regular Expression?

Regular expressions give you a concise and flexible way of matching text. You specify a pattern using a combination of literal (normal) text and special tokens. This forms a regular expression.

IMatch allows the use of regular expressions with many search, filter and replacement functions - often in addition to simpler "search engine-like" patterns. Regular expressions are very flexible and powerful and they will allow you to specify very exactly what you want to search for. They may look a bit strange at first, but this short introduction should give you the know-how to use them.

Regular Expression Syntax

In a regular expression (short: regex) every character matches itself, with the exception of these special characters:

.[]{}()\*+?|^$

By memorizing only a few simple rules you already know a lot about regular expressions:

Wildcards

The single character . (dot) matches any character, except if you use the . in a character set. You use the . when you want to specify "any" character. If you want to use a literal . in your pattern you must escape the dot with a leading backslash like so: \.

Anchors

The ^ matches the start of a line.
The $ matches the end of a line.

Using these two characters you can specify that you want to find text which starts with (^) or ends with ($) a pattern:

^beach

Find beach only when it is at the very beginning of a text.

beach$

Find beach only when it is at the very end of the text.

Sub Expressions

A section beginning with ( and ending with ) results in marked a sub expression. Sub expressions can be used to repeat sequences of characters in the form of (abc)* or (abc)?.

Repeats

Any atom (a single character, a marked sub-expression, or a character class) can be repeated with the *, +, ?, and {} operators:

* matches the preceding atom zero or more times.
+ matches the preceding atom one or more times.
? matches the preceding atom zero or one times.

Bounded Repeats

An atom can also be repeated with a bounded repeat:

a{n} Matches 'a' repeated exactly n times.
a{n,} Matches 'a' repeated n or more times.
a{n, m} Matches 'a' repeated between n and m times inclusive.

Examples

With the special tokens above you can handle almost all regular expressions you will ever need.

frog	Matches any text containing the word 'frog' No special characters are used here so the pattern is used "as-is" (as a literal).
a+	This regular expression matches all text containing an 'a', followed by any number of 'a's, including zero. The a+ means "at least one a": a aaa beat frog
a*	Matches any text having zero or any number of 'a's: frog beat
a?	Matches any text having zero or one 'a': frog a aa
^abc	Matches any text starting with 'abc': abcX This is the abc
abc$	Matches any text ending with 'abc': This is the abc abcX
\.jpg	Matches any text containing '.jpg'. Note the use of the '\' to escape the meaning of '.'.
\.tif$	Matches any text ending with '.tif'
(be)*X	Matches any text containing 'be' zero or more times, followed by an X: beX bebeX bebey

Alternations

Alternations are another important building block for regular expressions. As the name already indicates, you can specify alternatives with this special syntax.
The pipe symbol | is used to specify alternatives. For example (a|b) means "either a or b". (a|b)+ means "at least one a or b". You can combine alternations with literals: ab(c|d) will either match abc or abd.

Character Sets

A character set is a bracket-expression starting with [ and ending with ], it defines a set of characters, and matches any single character that is a member of that set.

Single Characters

The expression [abc] matches either 'a', 'b', or 'c'.

Character Ranges

[a-c] matches any single character in the range 'a' to 'c'. You can combine this with a repeat: [0-9]* matches any text containing a number between 0 and 9, any number of times (including zero).

Negation

If you enter a ^ as the first character in a character set [^abc] you negate (invert) the result. The expression [^abc] matches any character that is not 'a', 'b', or 'c'.

Escaping

Any special character preceded with \ matches itself. For example, \^ means '^' and \. means '.'

Boundaries

With these escape sequences you can match using word boundaries:

\< Matches the start of a word.
\> Matches the end of a word.
\b Matches a word boundary (the start or end of a word).
\B Matches only when not at a word boundary.

More Examples

Below you find some more examples of typical regular expressions you may find useful in IMatch.

\.jpg	Matches all text containing '.jpg'.
\.(tif\|tiff)	Matches all text containing '.tif' or '.tiff'.
\.(jpg\|tif\|tiff\|dng)$	Matches all text ending in either '.jpg', '.tif', '.tiff', or '.dng'. Regular expressions of this type are useful to find files in specific formats.
\.(doc\|xls\|ppt)	This regex finds standard Office documents
^[0-9]+.*	Matches all text starting with at least one digit, followed by arbitrary text: 0abc 100abc abc
^_DSC.*	Matches all text starting with _DSC: _DSC2839128
^_DSC[0-9]*\.jpg$	Matches all text starting with _DSC, followed by any number of digits and ending in .jpg. This is a typical file name as produced by many cameras. _DSC2839128.jpg _DSC2839128.raw

beach	Matches any text containing 'beach'
^beach	Matches any text beginning with 'beach'
beach\|sun	Matches any text containing either 'beach' or 'sun' or both.
_([0-9\|a-z])*\.jpg	Matches all text beginning with an _, followed by any combination of 0-9 or a-z followed by .jpg: _abc.jpg _.jpg __abc.jpg
^(IMG_\|DSC_)	Matches all text starting with either IMG_ or DSC_
\bbar\b	Performs a full word search on bar. It will match bar but not barstool or handlebar.

Conclusion

This is most of what you'll ever need regarding regular expressions in IMatch. There are more features in regular expressions, but these are seldom used in an application like IMatch. See the link at the start of this page for additional details and documentation.

Related Web Sites

Regular Expressions Web Site
The Premier website about Regular Expressions

Official Perl Regular Expressions
The official Perl documentation web site