POSIX regular expressions can be used to match files for inclusion or exclusion of files from file sets or searches.

They are much like the wildcard characters in Microsoft Windows, but much more flexible and powerful.

The standard Microsoft Windows wildcard matchers use two wildcards: * matches any number of any characters, and ? matches any character. For example, ?.doc returns a.doc, but not file.doc; *.doc matches both.

POSIX regular expressions have more options. These are some of the special characters used in POSIX regular expressions:

Character Description
. Matches a single character
[] Match characters specified within square brackets
[^] Match any characters, not specified within square brackets
\ Escape character. Toggles special meaning of the following character. To get a literal backslash (\), enter \\ (the first backslash makes the second one not to be a special character anymore)
\d Matches any number. Shorter variant of [0-9]
\D Matches anything but numbers. Shorter variant of [^0-9]
$ Matches the end of the filename
^ Matches the start of the filename
* Modifies the preceding character to match zero or more times
+ Modifies the preceding character to match once or more times
? Modifies the preceding character to match once or more times
{m} Modifies the preceding character to match m times, e.g. .{3} matches any three characters
{m,} Modifies the preceding character to match m or more times, e.g. .{3,} matches three or more characters
{m,n} Modifies the preceding character to match from m to n times, e.g. .{3,4} matches any 3 or 4 characters

Regular expressions will match a file if the expression matches any part of the filename. So, a regular expression g will match any file with the letter g anywhere in the filename. $ matches the end of the filename (including the extension), so g$ will match any filename ending in g, e.g. .mpg, .png and .jpg files.

Assume you have the following files: a.doc, a.dooc, a.dc, a.dac, and aldoc.

Regular expression a\.d.c matches both a.doc and a.dac. Backslash (\) makes period (.) between a and d match period (.) specifically, not any character.

Regular expression a.d.c matches a.doc, a.dac, and aldoc.

Wrapping characters in square brackets matches a single character to anything in the set. This means that regular expression d[abcde]c matches a.dac, but not a.doc (because o isn’t listed). [] can also contain a range of characters, so the same regular expression can be written as d[a-e]c, which is easier to write. Square brackets can also be used to match anything not in a list. So d[^f-z]c matches a.dac, but not a.doc since o is between f and z.

* modifies the preceding character to match zero or more times. So, regular expression do*c matches a.dc, a.doc, and a.dooc. * can also modify square brackets. Regular expression d[a-e]*c also matches a.dc, a.doc, and a.dooc.

+ modifies the preceding character match same as * does, except it requires at least one match. So, regular expression do+c matches a.doc and a.dooc, but not a.dc.

{} modifier sets a fixed number or range of matches. For example, .do{2}c matches a.dooc. You can also specify a range, e.g. regular expression a\.do{0,1}c matches a.dc, and a.doc, but not a.dooc.

Note

The CFA uses slash (/) to separate directories even in Windows where directories are separated with backslash (\).

Below are a few examples of the regular Windows character matching using wildcards, and their equivalent regular expressions.

Windows wildcard POSIX regular expression Explanation
*z*.*, *.*z* z Matches any file with z in its filename or extension
*.com \.com$ Matches all .com files
*.?om ..om$ Matches all .aom, .bom, .com, etc. files
*.aom, *.bom, *.zom .[abz]om$ Matches all .aom, .bom and .zom files in C:\Windows
a*.*, b*.*, c*.*, d*.*, e*.*, f*.*, g*.*, h*.*, i*.*, j*.* ^[a-j] Matches all files and directories starting with letters from a to j
N/A ^[0-9]*\.doc$ Matches all .doc files with filenames that are only numbers
N/A ^[0-9].*\.doc$ Matches all .doc files with filenames starting with a number
N/A ^[0-9]{6}\.doc$ Matches all .doc files with filenames of 6 characters long with only numbers in them

This is just the very basics of POSIX regular expressions that are very powerful and have many more various options.