Regular Expressions

Regular expressions are a popular means to search in texts. While they are heavily used in Unix environments, Windows users are most often not familiar with it. However, regular expressions are quite powerful and simple to use, so we decided to use it in SDLRename for searching in filenames.

The principle of regular expressions is simple: there are a few special characters which function as commands controlling the match between the regular expression (= search string) and the filename to be renamed. If a match is found the matched substring is replaced by the replacement string.

Below you find a short summary of available regular expression commands.

Command Explanation
\ Backslash. In order to search for characters which are normally used in regular expressions one has to quote these characters by a preceding backslash.
^ Circumflex. A circumflex as the first character of the regular expression constrains matches to the beginning of filenames. A circumflex as the first character of a set inverts the set. In any other case a circumflex is not allowed and has to be quoted by a preceding backslash in order to be searched for.
$ Dollar. A dollar sign as the last character of the regular expression constrains matches to the end of a filename.
. Period. A period matches any single character.
[] Square brackets. A string enclosed in square brackets defines a set of characters which matches the filename at the current position. If the first character of the string is a circumflex the set is inverted (this means that the regular expression matches any character except the characters in the set). In order to specify a range of characters, the first and the last characters of the range have to be connected by a hyphen (e.g. [0-9] defines a set of all digits). Note that most of the special characters usually used as expression specifiers (i.e. $ . [ ( ) ? * and+ ) are treated as normal characters when used in a set definition. The only exceptions are the circumflex, the backslash, the hyphen and the closing bracket. These characters have to be quoted by a preceding backslash in order to be processed correctly.
? Question mark. An expression followed by a question mark matches zero or one occurrence of that expression.
* Asterisk. An expression followed by an asterisk matches zero or more occurrences of that expression.
+ Plus. An expression followed by a plus sign matches one or more occurrences of that expression.
\#xx Special character. ASCII code of an arbitrary character. xx is a two digit hexadecimal number representing the ASCII code of this character.
() Parentheses. Parentheses can be used to group characters together prior to using the repetition operators * + or ?. Parantheses must not be stacked.
Please note that there is one particular mistake which is quite common: the meaning of the asterisk (*) character in regular expressions is different from the star character in DOS/Windows wildcards of filenames: the star in a regular expression means "zero or more occurrences of the preceding expression"; the star in DOS/Windows wildcards means "any number of valid characters". The corresponding regular expression to the DOS wildcard "*" is ".*" - mind the dot before the star.

Examples:

.* denotes any number of arbitrary characters
[0-9]+ any decimal number ([0-9] means any digit, + repeats it one ore more times)
test.?txt matches strings such as test.txt, testtxt, test3txt, test-txt but not test--txt


Last Update: 2006-Nov-01