C # regular expression
A regular expression is a pattern that matches input text.
.Net
framework provides a regular expression engine that allows thiskind of matching.
A pattern consists of one or more characters, operators, and structures.
If you don’t already understand regular expressions, you can read our regular expressions tutorial.
Define regular expressions
The characters, operators, and structures used to define various categories of regular expressions are listed below.
Character escape
Character class
Anchor point
Grouping construction
Qualifier
Reverse reference construction
Standby structure
Replace
Miscellaneous construction
Character escape
The backslash character () in a regular expression indicates that the character that follows it is a special character or that it should be interpreted literally.
The following table lists the escape characters:
Escape character |
Description |
Pattern |
Match |
---|---|---|---|
|
Matches the alarm (bell) characteru0007. |
a |
“u0007” in “Warning!” + “u0007” |
|
In the character class, match the backspace key u0008. |
[b]{3,} |
“bbbb” in “bbbb” |
|
Matches the tab u0009. |
(w +)t |
“NametAddrt” in “Namet” and “Addrt” |
|
Matches the carriage return character u000D. (r is not equivalent to the newline character n. ) |
rn (w +) |
“rnHellonWorld.” in “rnHello” |
|
Matches the vertical tab character u000B. |
[v]{2,} |
“vvv” in “vvv” |
|
Match the feed character u000C. |
[f]{2,} |
“fff” in “fff” |
|
Matches the newline character u000A. |
rn (w +) |
“rnHellonWorld.” in “rnHello” |
|
Matches the escape character u001B. |
e |
“x001B” in “x001B” |
|
Specify a character using octal representation (nnn consists of two to threedigits). |
w040w |
“a b” and “c d” in “a bc d” |
|
Specify characters in hexadecimal representation (nn happens to be made up of two digits). |
wx20w |
“a b” and “c d” in “a bc d” |
|
Matches the ASCII control character specified by X or x, where X or x is theletter of the control character. |
cC |
“x0003” (Ctrl-C) in “x0003” |
|
Matches a Unicode character (four digits represented by nnnn) using a hexadecimal representation. |
wu0020w |
“a b” and “c d” in “a bc d” |
|
Matches an unrecognized escape character after it. |
d+[+-x*]d+d+[+-x*d+ |
“2+2” and “3*9” in “(2+2) * 3*9” |
Character class
The character class matches any one of a set of characters.
The following table lists the character classes:
Character class |
Description |
Pattern |
Match |
---|---|---|---|
|
Matches any single character in character_group. By default, matches are case sensitive. |
[mn] |
“m” in “mat”, “m” and “n” in “moon” |
|
Non: matches any single character that is not in the character_group. By default, characters in character_group are case sensitive. |
[^aei] |
“v” and “l” in avail |
|
Character range: matches any single character in the range from first to last. |
[b-d] |
[b-d]irds can match Birds, Cirds, Dirds |
|
Wildcard: matches any single character except n. To match the originalmeaning period character (. or u002E), you must precede the character withan escape character (.). |
a.e |
Ave in “have” and “ate” in “mate” |
|
Matches any single character in the Unicode generic category or named block specified by name. |
p{Lu} |
“C” and “L” in City Lights |
|
Matches any single character that is not in the Unicode generic category or named block specified by name. |
P{Lu} |
“I”, “t” and “y” in “City” |
|
Matches any word character. |
w |
“R”, “o”, “m” and “1” in Room#1 |
|
Matches any non-word character. |
W |
“#” in “Room#1” |
|
Matches any white space character. |
ws |
“D” in “ID A1.3” |
|
Matches any non-white space character. |
sS |
“_” in “int _ _ ctr” |
|
Matches any decimal number. |
d |
“4” in “4 = IV” |
|
Matches any character that is not a decimal number. |
D |
“”, “=”, “”, “I” and “V” in “4 = IV” |
Anchor point
Anchor points or atomic zero-width assertions make the match successful or failed, depending on the current position in the string, but they do not cause the engine to advance or use characters in the string.
The following table lists the anchor points:
Assertion |
Description |
Pattern |
Match |
---|---|---|---|
|
The match must start at the beginning of a string or line. |
^d{3} |
“567” in “567-777 -” |
|
The match must appear at the end of the string or before nat the end of the line or string. |
-d{4} $ |
“- 2012” in “8-12-2012” |
|
The match must appear at the beginning of the string. |
aw{4} |
“Code” in Code-007- |
|
The match must appear at the end of the string or before nat the end of the string. |
-d{3}Z |
“- 007” in “Bond-901-007” |
|
The match must appear at the end of the string. |
-d{3}z |
“- 333” in “- 901-333” |
|
The match must appear at the end of the previous match. |
G(d) |
“(1) (3) (5) [7] (9) “(1)”, “(3)” and “(5)” in “ |
|
Match a word boundary, that is, the position between the word and the space. |
Erb |
Matches er in never, but not er in verb. |
|
Matches non-word boundaries. |
ErB |
Matches er in verb, but not er in never. |
Grouping construction
The grouping construction describes the subexpression of the regular expression and is usually used to capture the substring of the input string.
This section is difficult to understand, and you can read regular expression-selection, antecedent assertions of regular expressions (lookahead), and later assertions (lookbehind) to help understand.
The following table lists the grouping construction:
Grouping construction |
Description |
Pattern |
Match |
---|---|---|---|
(subexpression) |
Capture matching subexpressions and assign them to a zero-based sequence number. |
(w)1 |
“ee” in deep |
(?< name >subexpression) |
Capture matching subexpressions into a named group. |
(? < double >w)k< double> |
“ee” in deep |
(?<name1-name2>subexpression) |
Define the balance group definition. |
(((?’Open’()[^()]*)+((?’Close-Open’))[^()]*)+)*(?(Open)(?!))$ |
“((1-3)*(3-1))in “3+2^((1-3)*(3-1))” |
(?: subexpression) |
Define a non-capture group. |
Write(?:Line) |
WriteLine in “Console.WriteLine()” |
(?imnsx-imnsx:subexpression) |
Applies or disables the options specified in subexpression. |
Ad{2}(?i:w+)b |
“A12xl” and “A12XL” in “A12xl A12XL a12xl” |
(?= subexpression) |
Zero width is predicting advance assertions. |
w+(?=.) |
“is”、 “ran” and “out” in “He is. The dog ran. The sun is out.” |
(?! subexpression) |
Zero width negative prediction antecedent assertion. |
b(?!un)w+b |
“sure” and “used” in “unsure sure unity used” |
(?<=subexpression) |
Zero width is being reviewed and asserted. |
(?<=19)d{2}b |
“99”、”50” and “05” in “1851 1999 1950 1905 2003” |
(?<! subexpression) |
Make an assertion after a negative review of zero width. |
(?<!wo)manb |
“man” in “Hi woman Hi man” |
(?> subexpression) |
Non-backtracking (also known as “greedy”) subexpression. |
[13579](?>A+B+) |
“1ABB”, “3ABB” and “5AB” in 1ABB 3ABBC 5AB 5AC |
Example
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "1851 1999 1950 1905 2003";
string pattern = @"(?<=19)\d{2}\b";
foreach (Match match in Regex.Matches(input, pattern))
Console.WriteLine(match.Value);
}
}
Qualifier
The qualifier specifies how many instances of the previous element (which can be a character, group, or character class) must exist in the input string for a match to occur. Qualifiers include the language elements listedin the following table.
The following table lists the qualifiers:
Qualifier |
Description |
Pattern |
Match |
---|---|---|---|
|
Matches the previous element zero or more times. |
d*.d |
“.0”, “19.9”, “219.9” |
|
Matches the previous element one or more times. |
“be+” |
Bee in “been” and “be” in “bent” |
|
Matches the previous element zero or once. |
“rai?n” |
“ran”, “rain” |
|
Match the last element exactly n times. |
“,d {3}” |
“043” in “1043.6”, “876”, “543” and “9876543210” in 9876543210 |
|
Match the previous element at least n times. |
“d {2,}” |
“166,29,1930” |
|
Match the previous element at least n times, but not more than m times. |
“d{3,5}” |
“19302” in “166”, “17668” and “193024” |
|
Match the previous element zero or more times, but as few times as possible. |
d*?.d |
“.0”, “19.9”, “219.9” |
|
Match the previous element one or more times, but as few times as possible. |
“be+?” |
“bein “been” and “be” in “bent” |
|
Match the previous element zero or once, but as few times as possible. |
“rai??n” |
“ran”, “rain” |
|
Match the leading element exactly n times. |
“d{3}?” |
“043” in “1043.6”, “876”, “543” and “9876543210” in 9876543210 |
|
Match the previous element at least n times, but as few times as possible. |
“d{2,}?” |
“166,29” and “1930” |
|
The number of matches to the previous element is between n and m, but as fewtimes as possible. |
“d{3,5}?” |
“193s” and “024s” in “17668”, “193024” |
Reverse reference construction
Back references allow you to subsequently identify previously matched sub expressions in the same regular expression.
The following table lists the back reference constructions:
Reverse reference construction |
Description |
Pattern |
Match |
---|---|---|---|
|
Reverse reference. Matches the value of the numbering subexpression. |
(w)1 |
“ee” in seek |
|
Name the back reference. Matches the value of a named expression. |
(?< char>w)k< char> |
“ee” in seek |
Standby structure
Alternate constructs are used to modify regular expressions to enable either/or matching.
The following table lists the alternate constructions:
Standby structure |
Description |
Pattern |
Match |
---|---|---|---|
|
Matches any element separated by a vertical bar (|) character. |
th(e|is|at) |
“the” and “this” in “this is the day.” |
(?( expression )yes | no ) |
If the regular expression pattern is specified by expression match, match yes; otherwise matches the optional no part. Expression is interpreted as a zero-width assertion. |
(?(A)Ad{2}b|bd{3}b) |
“A10” and “910” in “A10 C103 910” |
(?( name )yes | no ) |
If name or named or numbered capture groups have a match, match yes; otherwise match optional no. |
|
Dogs.jpg and Yiska playing.jpg in “Dogs.jpg” Yiska playing.jpg “” |
Replace
Substitution is the regular expression used in the replacement pattern.
The following table lists the characters used for replacement:
Character |
Description |
Pattern |
Replacement mode |
Input string |
Result string |
---|---|---|---|---|---|
$number |
Replaces substrings that match by group number. |
b(w+)(s)(w+)b |
$3$2$1 |
“one two” |
“two one” |
${name} |
Replaces substrings that match by named group name. |
b(?< word1>w+)(s)(?< word2>w+)b |
${word2} ${word1} |
“one two” |
“two one” |
$$ |
Replace the character “$”. |
b(d+)s?USD |
$$$1 |
“103 USD” |
“$103” |
$& |
Replaces a copy of the entire match. |
($*(d*(.+d+)?){1}) |
|
“$1.30” |
|
$` |
Replaces all text of the input string before the match. |
B+ |
$` |
“AABBCC” |
“AAAACC” |
$’ |
Replaces all the text of the matched input string. |
B+ |
$’ |
“AABBCC” |
“AACCCC” |
$+ |
Replace the last captured group. |
B+(C+) |
$+ |
“AABBCCDD” |
AACCDD |
$_ |
Replaces the entire input string. |
B+ |
$_ |
“AABBCC” |
“AAAABBCCCC” |
Miscellaneous construction
The following table lists the various miscellaneous constructs:
Construction |
Description |
Example |
---|---|---|
(?imnsx-imnsx) |
Options such as case insensitivity are set or disabled in the middle of the mode. |
bA(?i)bw+b matches “ABA” and “Able” in “ABA Able Act” |
(?#Notes) |
Inline comments. The comment terminates at the first right bracket. |
bA (?# matches words that begin with A)w+b |
#[end of the line] |
The comment begins with a non-escaped # and continues to the end of the line. |
(?x)bAw+b# matches words that begin with A. |
Regex class
Regex
class is used to represent a regular expression.
The following table lists some commonly used methods in the Regex
class:
Serial number |
Method & description |
---|---|
1 |
Public bool IsMatch (string input) indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string. |
2 |
Public bool IsMatch (string input, int startat) indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, starting at the start position specified in the string. |
3 |
Public static bool IsMatch (string input, string pattern) indicates whether the specified regular expression finds a match in the specified input string. |
4 |
Public MatchCollection Matches (string input) searches the specified input string for all matches of the regular expression. |
5 |
Public string Replace (string input, string replacement) replaces all matching strings that match the regular expression pattern with the specified replacement string in the specified input string. |
6 |
Public string [] Split (string input) splits the input string into an array of substrings, based on the location defined by the regular expression pattern specified in the Regex constructor. |
If you need to know Regex
for a complete list of properties of the class, please refer to Microsoft’s C# documentation.
Example 1
The following example matches a word that starts with ‘S’:
Example
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
private static void showMatch(string text, string expr)
{
Console.WriteLine("The Expression: " + expr);
MatchCollection mc = Regex.Matches(text, expr);
foreach (Match m in mc)
{
Console.WriteLine(m);
}
}
static void Main(string[] args)
{
string str = "A Thousand Splendid Suns";
Console.WriteLine("Matching words that start with 'S': ");
showMatch(str, @"\bS\S*");
Console.ReadKey();
}
}
}
When the above code is compiled and executed, it produces the following results:
Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns
Example 2
The following example matches a word that starts with ‘m’ and ends with ‘e’:
Example
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
private static void showMatch(string text, string expr)
{
Console.WriteLine("The Expression: " + expr);
MatchCollection mc = Regex.Matches(text, expr);
foreach (Match m in mc)
{
Console.WriteLine(m);
}
}
static void Main(string[] args)
{
string str = "make maze and manage to measure it";
Console.WriteLine("Matching words start with 'm' and ends with
'e':");
showMatch(str, @"\bm\S*e\b");
Console.ReadKey();
}
}
}
When the above code is compiled and executed, it produces the following results:
Matching words start with 'm' and ends with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure
Example 3
The following example replaces extra spaces:
Example
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
static void Main(string[] args)
{
string input = "Hello World ";
string pattern = "\\\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
Console.ReadKey();
}
}
}
When the above code is compiled and executed, it produces the following results:
Original String: Hello World
Replacement String: Hello World