Regular Expressions

Several forms have filters where the user can type in a regular expression to restrict the list of items displayed. This appendix documents the supported regular expression language.

Characters

c

Matches the character c.

\\

Matches the backslash character.

\0n

Matches the character specified by the octal constant between 0 and 0377.

\xhh

Matches the character specified by the two digit hexadecimal constant.

\uhhhh

Matches the character specified by the four digit hexadecimal constant.

\t

Matches the tab character.

\n

Matches the newline character.

\r

Matches the carriage-return character.

\f

Matches the form-feed character.

\a

Matches the alert character.

\e

Matches the escape character

\cc

Matches a control-c character.

Character Classes

[abc]

Matches any of a list of characters.

[a-zA-Z]

Matches ranges of characters.

[^abc]

Matches any character except those in the class.

[abc&&[xyz]]

Matches any character in both classes.

.

Matches any character.

\d

Matches any digit.

\D

Matches any non-digit.

\s

Matches any whitespace character: tab, newline, form-feed, carriage-return, or \x0B.

\S

Matches any non-whitespace character.

\w

Matches any word character: [a-zA-z_0-9].

\W

Matches any non-word character.

\p{Lower}

Matches any lower-case character: [a-z].

\p{Upper}

Matches any upper-case character: [A-Z].

\p{ASCII}

Matches any 7-bit ASCII character: [\x00-\x7F].

\p{Alpha}

Matches any alphabetic character.

\p{Digit}

Matches any digit.

\p{Alnum}

Matches any alphanumeric character.

\p{Punct}

Matches any punctuation: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~.

\p{Graph}

Matches any visible character: [\p{Alnum}\p{Punct}].

\p{Print}

Matches any printable character: [\p{Graph}\x20].

\p{Blank}

Matches a space or a tab.

\p{Cntrl}

Matches a control character: [\x00-\x1F\x7F].

\p{XDigit}

Matches a hexadecimal digit: [0-9a-fA-F].

\p{Space}

Matches a whitespace character: space, tab, new-line, \x0B, form feed, or carriage-return.

\p{javaLowerCase}

Matches any character for which java.lang.Character.isLowerCase() is true.

\p{javaUpperCase}

Matches any character for which java.lang.Character.isUpperCase() is true.

\p{javaWhitespace}

Matches any character for which java.lang.Character.isWhitespace() is true.

\p{javaMirrored}

Matches any character for which java.lang.Character.isMirrored() is true.

\p{InGreek}

Matches any character in the Unicode Greek block.

\p{L}

Matches any Unicode letter.

\p{Lu}

Matches any Unicode upper-case letter

\P{InGreek}

Matches any character not in the Unicode Greek block.

[\p{L}&&[^p{Lu}]]

Matches any character that is a Unicode letter that is not a Unicode upper-case letter.

Boundary Markers

^

Matches the beginning of a name.

$

Matches the end of a name.

\b

Matches a word boundary.

\B

Matches a non-word boundary.

Quantifiers

X?

Matches X zero or one times.

X*

Matches X zero or more times.

X+

Matches X one or more times.

X{n}

Matches X n times.

X{n,}

Matches X n or more times.

X{n,m}

Matches X n to m times (inclusive).

Q

Matches as much text as possible with quantifier Q, but will back off from that to let the overall match succeed (greedy quantifier).

Q?

Matches the minimum text possible with quantifier Q, but will match additional text to let the overall match succeed (reluctant quantifier).

Q+

Matches as much text as possible with quantifier Q, and will not back off to let the overall match succeed (possessive quantifier).

Logical Operators

XY

Matches X followed by Y.

X|Y

Matches either X or Y.

(X)

Matches X as a capturing group. Capturing groups are numbered in the order of the opening parentheses.

\n

Matches whatever the capturing group n matched.

\c

Matches c if \c isn’t otherwise a special character.

\Q.../E

Matches the enclosed characters without interpreting special characters.

Special Constructs

(?:X)

Matches X as a non-capturing group.

(?>X)

Matches X as a non-capturing group.

(?=X)

Matches X as a zero-width lookahead.

(?!X)

Matches not X as a zero-width lookahead.

(?<=X)

Matches X as a zero-width lookbehind.

(?<!X)

Matches not X as a zero-width lookbehind.

(?options-options)

Turns options on or off for subsequent matches. Since case-insensitive matching is on by default in filters, the expression my(?-i)P(?i)oint will match the letter “P” in a case sensitive manner, but all the other letters ignoring case.

(?options-options:X)

Matches X using he specified options

Options

i

Enables case-insensitive matching. This options is on by default in the filters.

d

Enables Unix lines mode, where the \n line terminator is recognized by ., ^, and $ in regular expression patterns.

m

Enables multiline mode, where ^ and $ match just after or just before line terminators in addition to the beginning and end of the input sequence.

s

Enables dotall mode, where . matches line terminators.

u

Enables Unicode-aware case folding, where case insensitive mode (enabled or disabled separately) is done consistent with the Unicode Standard instead of only for US-ASCII.

x

Permits comments (from # character to end of line) and whitespace in regular expression patterns.