Regex Pattern Features

Introduction

This page describes the specRegex regex dialect, and how it compares to some mainstream regex engines. In most cases, your regex will “just work” with specregex.

The following comparison chart is essentially the one from the reference material provided by regular-expressions.info, with specRegex added. The tutorial provided on that website is probably the best possible way to learn how to write regular expressions.

There is also a similar chart for replacement strings.

Special and Non-Printable Characters

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Literal Character Any character except [\^$.\|?*+() All non-special characters match a single instance of themselves a matches a
Literal curly braces { and } { and } are literals unless they’re part of a token such as a {3}
Backslash escapes a metacharacter \followed by any of [\^$.\|?*+(){} A backslash escapes special characters to suppress their special meaning \* matches *
Escape sequence \Q...\E Matches all characters in ... literally \Q.+*\E matches .+*
Hexadecimal escape \xFF where FF are 2 hexadecimal digits Matches the single-byte character FF
Character escape \n, \r and \t Match an LF character, CR character and a tab character respectively \r\n matches a CRLF line break
Line break \R Any line break: CRLF, CR, LF, \f, \v, or any Unicode line break
Line break \R Matches the next line control character U+0085
Line break \R CRLF line breaks are indivisible
Line break A literal line break Matches any line break, regardless of the line break style used.
Character escape \a Match the “alert” or “bell” control character (ASCII 0x07)
Character escape \b Match the “backspace” control character (ASCII 0x08)
Character escape \B Match a backslash (Use \\ instead)
Character escape \e Match the “escape” control character (ASCII 0x1A)
Character escape \f Match the “form feed” control character (ASCII 0x0C)
Character escape \v Match the “vertical tab” control character (ASCII 0x0B)
Control character escape \cA through \cZ ASCII character ^A through ^Z, equivalent to \x01 through\x1A \cM\cJ matches a CRLF linebreak
Control character escape \ca through \cz ASCII character ^A through ^Z, equivalent to \x01 through\x1A \cm\cj matches a CRLF linebreak
NULL escape \0 Matches the null character
Octal escape \o{7777} for any octal number Matches the character with the specified number \o{20254} matches
Octal escape \1 through \7 Matches the character at the specified position in the ASCII table \7 matches the “bell” character ✔ (Opt)
Octal escape \10 through \77 Matches the character at the specified position in the ASCII table \77 matches ? ✔ (Opt)
Octal escape \100 through \177 Matches the character at the specified position in the ASCII table \100 matches @ ✔ (Opt)
Octal escape \200 through \377 Matches the character at the specified position in the active code page \377 matches ÿ ✔ (Opt)
Octal escape \400 through \777 Matches the character at the specified position in the active code page \777 matches ǿ ✔ (Opt)
Octal escape \01 through \07 Matches the character at the specified position in the ASCII table \07 matches the “bell” character ✔ (Opt)
Octal escape \010 through \077 Matches the character at the specified position in the ASCII table \077 matches ? ✔ (Opt)
Octal escape \0100 through \0177 Matches the character at the specified position in the ASCII table \0100 matches @
Octal escape \0200 through \0377 Matches the character at the specified position in the active code page \0377 matches ÿ

Basic Features

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Dot . Matches any character except line breaks, depending on options.
Not a line break \N Matches any character except line break, regardless of options.
Alternation \| Match either of several possible patterns. ab\|de\|xy matches ab, de or xy
Line feed is alternation A literal line break A literal line break character functions as alternation.
Alternation is eager \| Alternation takes the first option from the left that matches a\|ab matches a in ab ✔ (Opt)
Alternation is greedy \| Alternation takes the longest possible match from all options. a\|ab matches ab in ab ✔ (Opt)

Character Classes

Unless otherwise noted, the syntax in this section is valid only inside a character class. In effect, this is describing the language used to define character classes in each regex dialect.

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Character classes [ [ begins a character class. Inside a character class, different rules apply
Literal character Any character except ^-]\ Non-special characters add themselves to the character class [abc] matches a, b or c
Backslash escapes metacharacter \\ followed by any of ^-]\ A backslash escapes special characters to suppress their special meaning. [\^\]] matches ^ or ]
Range Hyphen between two characters: eg. Adds a range of characters to the character class. [a-zA-Z0-9] matches any ASCII letter or digit
Ranges with escapes Ranges support character escapes Adds a range of characters to the character class. [\0-z] matches characters between NULL and z
Negated character class [^ Negates the character class, so it matches any character not in the set. [^a-d] matches any character except a, b, c or d
Literal opening bracket [ An [ inside a character class adds [ to the class. [ab[cd]ef] matches aef], bef], [ef], cef], and def]
Nested character class [ An [ inside a character class starts a nested character class. [ab[cd]ef] is the same as [abcdef]
Character class subtraction [base-[subtract]] Removes all characters in the “subtract” class from the “base” class. [a-z-[aeiuo]] matches a single letter that is not a vowel. ✔ (Opt)
Character class intersection [base&&[intersect]] Reduces the character class to the characters present in both “base” and “intersect”. [a-z&&[^aeiuo]] matches a single letter that is not a vowel.
Character class intersection [base&&intersect] Reduces the character class to the characters present in both “base” and “intersect”. [a-z&&[^aeiuo]] matches a single letter that is not a vowel.
Character escape \n, \r and \t Add an LF character, a CR character, or a tab character to the character class. [\n\r\t] a line feed, a carriage return, or a tab.
Character escape \a Add the “alert” or “bell” control character (ASCII 0x07) to the character class. [\a\t] matches a bell or a tab character.
Character escape \b Add the “backspace” control character (ASCII 0x08) to the character class. [\b\t] matches a backspace or a tab character.
Character escape \B Add a backslash to the character class.
Character escape \e Add the “escape” control character (ASCII 0x1A) to the character class. [\e\t] matches an escape or a tab character.
Character escape \f Add the “form feed” control character (ASCII 0x0C) to the character class. [\f\t] matches a form feed or a tab character.
Character escape \v Add the “vertical tab” control character (ASCII 0x0B) to the character class. [\v\t] matches a vertical tab or a tab character.
POSIX class [:alpha:] Adds the members of the named POSIX character class. [[:digit:][:lower:]] is the same as [0-9a-z]
Negated POSIX class [:^alpha:] Adds everything except the members of the named POSIX character class. [5[:^digit:]] matches 5 or any non-digit.
POSIX shorthand class [:d:], [:s:], [:w:] Shorthands for the “digit”, “space”, and “word” classes. [[:s:][:d:]] matches space, tab, line break, or 0-9
POSIX shorthand class [:l:], [:u:] Shorthands for the “upper” and “lower” classes. [[:u:]][[:l:]] matches Aa but not aA.
POSIX shorthand class [:h:] Shorthand for the “blank” class. [[:h:]] matches a space.
POSIX shorthand class [:v:] Shorthand for the “vertical space” class. [[:v:]] matches any single vertical whitespace character.
POSIX class Any supported \p{...} syntax \p{...} syntax can be used inside character classes. [\p{Digit}\p{Lower}] matches one of 0-9 or a-z
\p to identify POSIX classes \p{SomePosixClass} Matches a single character from POSIX class “SomePosixClass”. May be outside a char class. \p{Digit} matches any single digit.
\p to identify POSIX classes \p{IsSomePosixClass} Matches a single character from POSIX class “SomePosixClass”. May be outside a char class. \p{IsDigit} matches any single digit.
POSIX collation sequence [.span-ll.] Matches a POSIX collation sequence. [[.span-ll.]] matches ll in the Spanish locale
POSIX character equivalence [=x=] Matches a POSIX character equivalence. [[=e=]] matches e, é, è and ê in the French locale

Shorthand Character Classes

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Shorthand Any shorthand outside character classes Shorthands can be used outside character classes. \w matches a single word character ✔ (Opt)
Shorthand Any shorthand inside character classes Shorthands can be used inside character classes. [\w] matches a single word character ✔ (Opt)
Shorthand Negated shorthand inside character classes. Negated shorthands can be used inside character classes. [\W] matches a single non-word character ✔ (Opt)
Shorthand \d Adds all digits to the class, or matches a single digit. [\d] or \d match a single digit
Shorthand \w Adds all word characters to the class, or matches a single word character. [\w]/\w match a single word character
Shorthand \s Adds all whitespace to the class, or matches a single whitespace character. [\s]/\s match a single whitespace character
Shorthand \l/\u Adds all lowercase/uppercase characters to the class, or matches one such character. \u\l matches Aa but not aA.
Shorthand \v Adds all vertical whitespace characters to the class, or matches one such character. [\v]/\v match a single vertical whitespace character ✔ (Opt)
Shorthand \h Adds all horizontal whitespace characters to the class, or matches one such character. [\h]/\h match a single horizontal whitespace character ✔ (Opt)
Shorthand \h Adds all hex digit characters to the class, or matches one such character. [\h]/\h match a single hex digit ✔ (Opt)
Shorthand \i Adds all characters allowed at the start of an XML identifier (or matches one). \i\c* matches an XML name
Shorthand \c Adds all characters allowed in an XML identifier after the first. \i\c* matches an XML name

Anchors

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
String anchor ^ Matches at the start of the string the regex pattern is applied to. ^. matches a in abc\ndef
String anchor $ Matches at the end of the string the regex pattern is applied to. .$ matches f in abc\ndef
String anchor $ Matches before the final line break, if any, as well as the end of the string. .$ matches f in abc\ndef\n
Line anchor ^ Matches after each line break as well as the start of the string. ^. matches a and d in abc\ndef ✔ (Opt) ✔ (Opt) ✔ (Opt)
Line anchor $ Matches before each line break as well as the end of the string. .$ matches c and f in abc\ndef ✔ (Opt) ✔ (Opt) ✔ (Opt)
String anchor \A Matches at the start of the string the regex pattern is applied to. \A\w matches only a in abc

Word Boundaries

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Word boundary \b Matches a position that is followed by a word character but not preceded by one, or that is preceded by a word character but not followed by one. \b. matches a, , and d in abc def ✔ (Opt) ✔ (Opt) ✔ (Opt)
Word boundary \B Matches at a position that is preceded and followed by a word character, or that is not preceded and not followed by a word character. \B. matches b, c, e, and f in abc def ✔ (Opt) ✔ (Opt) ✔ (Opt)

Quantifiers

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Greedy quantifier ? Makes the preceding item optional. Greedy, so prefers to match if possible. abc? matches abc or ab
Lazy quantifier ?? Makes the preceding item optional. Lazy, so prefers not to match. abc?? matches ab or abc ✔ (Opt)
Possessive quantifier1 ?+ Makes the preceding item optional. Possessive, so will never relinquish match. abc?+c matches abcc but not abc
Greedy quantifier * Match preceding item 0 or more times. Greedy, so prefers to match if possible. ".*" matches "def" "ghi" in abc "def" "ghi" jkl
Lazy quantifier *? Match preceding item 0 or more times. Lazy, so prefers not to match. ".*?" matches "def" and "ghi" in abc "def" "ghi" jkl ✔ (Opt)
Possessive quantifier2 *+ Match preceding item 0 or more times. Possessive, so will never relinquish match. ".*+" can never match anything
Greedy quantifier + Match preceding item 1 or more times. Greedy, so prefers to match if possible. ".+" matches "def" "ghi" in abc "def" "ghi" jkl
Lazy quantifier +? Match preceding item 1 or more times. Lazy, so prefers not to match. ".+?" matches "def" and "ghi" in abc "def" "ghi" jkl ✔ (Opt)
Possessive quantifier3 ++ Match preceding item 1 or more times. Possessive, so will never relinquish match. ".++" can never match anything
Fixed quantifier {N} for integer N >= 0 Match preceding item N times. a{3} matches aaa
Greedy ranged quantifier {N,M} for integers N,M >= 0, M>=N Match preceding item between N and M times, preferring more repetitions. a{2,3} matches aa and aaa.
Lazy ranged quantifier {N,M}? for integers N,M >= 0, M>=N Match preceding item between N and M times, preferring fewer repetitions. a{2,4}? matches aa, aaa or aaaa
Possessive ranged quantifier4 {N,M}+ for integers N,M >= 0, M>=N Match preceding item between N and M times, possessive. a{2,4}+a matches aaaaa but not aaaa
Greedy variable quantifier {N,} for integer N >= 0 Match preceding item at least N times, preferring more repetitions. a{2,} matches aa, aaa, aaaa, etc.
Lazy variable quantifier {N,}? for integer N >= 0 Match preceding item at least N times, preferring fewer repetitions. a{2,}? matches aa in aaaaa
Possessive variable quantifier5 {N,}+ for integer N >= 0 Match preceding item at least N times, possessive. a{2,}+a never matches anything
Greedy variable quantifier {,N} for integer N >= 0 Match preceding item no more than N times, preferring more repetitions. a{,4} matches aaaa, aaa, aa, a, or the empty string
Lazy variable quantifier {,N}? for integer N >= 0 Match preceding item no more than N times, preferring fewer repetitions. a{,4}? matches the empty string, a, aa, aaa or aaaa

Unicode

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Grapheme \X Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. \X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc.
Code point \uFFFF where FFFF are 4 hex digits Matches a specific Unicode code point. \u00E0 matches à encoded as U+00E0 only. \u00A9 matches © ✔ (Opt)
Code point \u{FFFF} where FFFF are 1-4 hex digits Matches a specific Unicode code point. \u{E0} matches à encoded as U+00E0 only. \u{A9} matches ©
Code point \xFFFF where FFFF are 4 hex digits Matches a specific Unicode code point. \x00E0 matches à encoded as U+00E0 only. \x00A9 matches © ✔ (Opt)
Code point \x{FFFF} where FFFF are 1-4 hex digits Matches a specific Unicode code point. \x{E0} matches à encoded as U+00E0 only. \x{A9} matches ©
Unicode category \pL where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \pL matches à encoded as U+00E0; \pS matches ©
Unicode category \p{L} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{L} matches à encoded as U+00E0; \p{S} matches ©
Unicode category \p{IsL} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{IsL} matches à encoded as U+00E0; \p{IsS} matches ©
Negated unicode category \PL where L is a Unicode category Matches a single Unicode code point not in the specified Unicode category. \PS matches à encoded as U+00E0; \PL matches ©
Longhand category \p{Category} Matches a single Unicode code point in the specified Unicode category. \p{Letter} matches à encoded as U+00E0; \p{Symbol} matches ©
Longhand category \p{IsCategory} Matches a single Unicode code point in the specified Unicode category. \p{IsLetter} matches à encoded as U+00E0; \p{IsSymbol} matches ©
Unicode script \p{Script} Matches a single Unicode code point in the specified Unicode script. \p{Greek} matches Ω
Unicode script \p{IsScript} Matches a single Unicode code point in the specified Unicode script. \p{IsGreek} matches Ω
Unicode block \p{Block} Matches a single Unicode code point in the specified Unicode block. \p{Arrows} matches any of the code points from U+2190 until U+21FF ( until )
Unicode block \p{InBlock} Matches a single Unicode code point in the specified Unicode block. \p{InArrows} matches any of the code points from U+2190 until U+21FF ( until )
Unicode block \p{IsBlock} Matches a single Unicode code point in the specified Unicode block. \p{IsArrows} matches any of the code points from U+2190 until U+21FF ( until )
Negated unicode property \P{Property} Matches a single code point that lacks the named property (block/script/category). \P{L} matches ©
Negated unicode property \p{^Property} Matches a single code point that lacks the named property (block/script/category). \p{^L} matches ©
Unicode property \P{^Property} Matches a single code point that lacks the named property (block/script/category). \P{^L} matches q

Named Groups and Backreferences

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Named capture group (?<name>regex) A capture group named name. name must start with a letter (?<x>abc){3} matches abcabcabc. The group x matches abc
Named capture group (?'name'regex) A capture group named name. name must start with a letter (?'x'abc){3} matches abcabcabc. The group x matches abc
Named capture group (?P<name>regex) A capture group named name. name must start with a letter (?P<x>abc){3} matches abcabcabc. The group x matches abc
Duplicate named groups Any named group Two named groups can share the same name. (?<x>a)\|(?<x>b) matches a or b.
Duplicate named groups Any named group Named groups that share the same name are treated as one an the same group.
Duplicate named groups Any named group Backreferences refer to the leftmost participating group with the given name.
Named backreference \k<name> Refers to the text matched by group name (?<x>abc\|def)=\k<x> matches abc=abc or def=def, but not abc=def or def=abc.
Named backreference \k'name' Refers to the text matched by group name (?'x'abc\|def)=\k'x' matches abc=abc or def=def, but not abc=def or def=abc.
Named backreference \k{name} Refers to the text matched by group name (?'x'abc\|def)=\k{x} matches abc=abc or def=def, but not abc=def or def=abc.
Named backreference \g{name} Refers to the text matched by group name (?'x'abc\|def)=\g{x} matches abc=abc or def=def, but not abc=def or def=abc.
Named backreference (?P=name) Refers to the text matched by group name (?P<x>abc\|def)=(?P=x) matches abc=abc or def=def, but not abc=def or def=abc.
Failed backreference Any supported backreference Backreferences to groups that did not participate in the match attempt fail to match (?<x>a)?\k<x> matches aa but fails to match b.
Nested backreference Any supported backreference Backreferences can be used inside the group they reference. (?<x>a\k<x>?){3} matches aaaaaa.
Forward backreference Any supported backreference Backreferences can be used before the group they reference. (\k<x>?(?<x>a)){3} matches aaaaaa.
Named capturing group Any supported named capture group A number is a valid name for a capturing group. (?<17>abc){3} matches abcabcabc. The group named “17” matches abc.
Named capturing group Any capture group named with a number If the name of the group is a number, that becomes the group’s name and the group’s number. (?<17>abc\|def)=\17 matches abc=abc or def=def, but not abc=def or def=abc.
Named capturing group Any supported named capture group A negative number is a valid name for a capturing group. (?<-17>abc){3} matches abcabcabc. The group named “-17” matches abc.
Named backreference Any supported backreference A negative number can be used in a named backreference to refer to a negatively-named group

Special Groups

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Comment (?#comment) Ignored by the regex engine. a(?#foobar)b matches ab
Branch reset group (?\|foo\|bar\|baz) Capture group numbering starts from the same offset in each branch. (?\|(a)\|(b)) has only group 1
Atomic group6 (?>regex) Prevent backtracking into the group after it matches. a(?>bc\|b)cmatches abcc but not abc
Positive lookahead (?=regex) Assert that regex matches immediately after this position. t(?=s) matches the second t in streets ✔ (Opt)
Negative lookahead (?!regex) Assert that regex doesn’t match immediately after this position. t(?!s) matches the first t in streets ✔ (Opt)
Positive lookbehind (?<=regex) Assert that regex matches match immediately before this position. (?<=s)t matches the first t in streets
Negative lookbehind (?<!regex) Assert that regex doesn’t match immediately before this position. (?<!s)tmatches the second t in streets.
Lookbehind (?<=regex\|longer regex) Alternatives inside lookbehind can differ in length.
Lookbehind (?<=x{n,m}) Quantifiers with maximum repetition count can be used inside lookbehind. (?<=s\w{1,7})t matches the fourth t in twisty streets.
Lookbehind (?<=regex) The full regular expression syntax can be used inside lookbehind. (?<=s\w+)t matches only the fourth t in twisty streets.
Lookbehind (group)(?<=\1) Backreferences can be used inside lookbehind. (\w).+(?<=\1) matches twisty street in twisty streets.
Exclude text from match \K Text left of \K is omitted from overall match, but groups are unaffected. s\Kt matches only the first t in streets.

Mode Modifiers

Feature Syntax Description Example specRegex .NET std::regex PCRE2 RE2
Mode modifier (?letters) at the start of the regex A mode modifier at the start of the regex affects the whole regex and overrides any options set outside the regex. (?i)a matches a and A.
Mode modifier (?letters) at in the middle of the regex A mode modifier affects regex tokens to the right of it, until overridden by a contradictory mode modifier. te(?i)st matches test and teST but not TEst or TEST
Mode modifier group (?letters:regex) Non-capturing group with modifiers that affect only the part of the regex inside the group. te(?i:st) matches test and teST but not TEst or TEST
Negative modifier (?on-off) and (?on-off:regex) Modifier letters (if any) before the hyphen are turned on, while modifier letters after the hyphen are turned off. (?i)te(?-i)st matches test and TEst but not teST or TEST
Reset modifiers (?^) Turn off all options. The caret can be followed by modifier letters to turn some options back on. (?i)te(?^)st matches test and TEst but not teST or TEST
Case insensitive (?i) Turn on case insensitivity. (?i)a matches a and A
Free spacing (?x) Turn on free-spacing mode to ignore whitespace between regex tokens and allow # comments. (?x)a#b matches a
Freer spacing (?xx) Like (?x), but also allows free spacing inside character classes. (?xx)[ a] matches a but not
Tight spacing (?t) Disables free spacing mode. (?t)a#b matches a#b
Single line (?s) Make the dot match all characters including line break characters. (?s).* matches ab\n\ndef in ab\n\ndef
Multi line (?m) Make ^ and $ match at the start and end of each line. (?m)^. matches a and d in ab\n\ndef
Explicit capture (?n) Plain parentheses are non-capturing groups instead of numbered capturing groups. Only named capturing groups actually capture. (?n)(a\|b)c is the same as (?:a\|b)c
Duplicate named groups (?J) Allow multiple named capturing groups to share the same name. (?J)(?:(?'x'a)\|(?'x'b))\k'x' matches aa or bb
Ungreedy quantifiers (?U) Switches the syntax for greedy and lazy quantifiers. (?U)a* is lazy and (?U)a*? is greedy
UNIX lines (?d) When anchors match at line breaks, or dot does not match line breaks, only consider the line feed character as a line break. (?dm)^. matches a and c in a\rb\nc
Literal (?q) Interpret the regular expression as a literal string (excluding the modifier) (?q)[a\]+ matches [a\]+ literally

  1. The main application for atomic groups and possessive quantifiers is to avoid catastrophic backtracking. Since specRegex is not a backtracking-based regex engine, it is immune to this problem by design. One side effect of this design is that atomic groups and possessive quantifiers are difficult to implement, but it also makes them largely unnecessary.↩︎

  2. The main application for atomic groups and possessive quantifiers is to avoid catastrophic backtracking. Since specRegex is not a backtracking-based regex engine, it is immune to this problem by design. One side effect of this design is that atomic groups and possessive quantifiers are difficult to implement, but it also makes them largely unnecessary.↩︎

  3. The main application for atomic groups and possessive quantifiers is to avoid catastrophic backtracking. Since specRegex is not a backtracking-based regex engine, it is immune to this problem by design. One side effect of this design is that atomic groups and possessive quantifiers are difficult to implement, but it also makes them largely unnecessary.↩︎

  4. The main application for atomic groups and possessive quantifiers is to avoid catastrophic backtracking. Since specRegex is not a backtracking-based regex engine, it is immune to this problem by design. One side effect of this design is that atomic groups and possessive quantifiers are difficult to implement, but it also makes them largely unnecessary.↩︎

  5. The main application for atomic groups and possessive quantifiers is to avoid catastrophic backtracking. Since specRegex is not a backtracking-based regex engine, it is immune to this problem by design. One side effect of this design is that atomic groups and possessive quantifiers are difficult to implement, but it also makes them largely unnecessary.↩︎

  6. The main application for atomic groups and possessive quantifiers is to avoid catastrophic backtracking. Since specRegex is not a backtracking-based regex engine, it is immune to this problem by design. One side effect of this design is that atomic groups and possessive quantifiers are difficult to implement, but it also makes them largely unnecessary.↩︎