13.6. Regular expressions-exampl

13.6.1. Simple expression ¶

The simplest form of a regular expression is to match a single ordinary character itself in a search string. For example, a single-character pattern, such as A, always matches the letter A wherever it appears in the search string. Here are some examples of single-character regular expression patterns:

/a/
/7/
/M/

Many single characters can be combined to form a large expression. For example, the following regular expression combines single-character expressions: a, 7, and M.

/a7M/

Notice that there is no concatenation operator. Just type one character after another.

13.6.2. Character matching ¶

Period (.) Matches various printed or non-printed characters in a string, with one exception. The exception is the newline character (n). The following regular expressions match aac, abc, acc, adc, and so on, as well as A1c, A2C, Amurc, and aquic:

/a.c/

To match a string that contains a file name, and a period (.) Is part of the input string, please precede the period in the regular expression with a backslash () character. For example, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only let you match “any” single character. You may need to match a specific set of characters in the list. For example, you may need to find chapter titles represented by numbers (Chapter 1, Chapter 2, and so on).

13.6.3. Square bracket expression ¶

To create a list of matching character groups, in square brackets [ 和 ] Place one or more single characters within the When a character is enclosed in brackets, the list is called a bracketed expression. As in any other position, an ordinary character represents itself in square brackets, that is, it matches itself once in the input text. Most special characters lose their meaning when they appear within a bracketed expression. But there are some exceptions, such as:

If the] character is not the first item, it ends a list. To match the] character in the list, place it first, immediately after the start [.
字符继续作为转义符。若要匹配字符，请使用 \。

The characters enclosed in the bracketed expression match only a single character at that position in the regular expression. The following regular expressions match Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/Chapter [12345]/

Notice that the position of the word Chapter and the space after it is fixed relative to the characters in brackets. The bracket expression specifies only the character set that matches the position of the single character immediately following the word Chapter and the space. This is the ninth character position.

To use a range instead of the character itself to represent a matching character group, use a hyphen (-) to separate the start and end characters in the range. The character value of a single character determines the relative order within the range. The following regular expression contains a range expression, which is equivalent to the list in square brackets shown above.

/Chapter [1-5]/

When you specify a range in this way, both the start and end values are included in the range. It is also important to note that in Unicode sort order, the start value must precede the end value.

To include a hyphen in a square bracket expression, use one of the following methods:

To escape it with a backslash:

[\-]

Place the hyphen at the beginning or end of the bracketed list. The following expression matches all lowercase letters and hyphens:

[-a-z]
[a-z-]

Creates a range in which the start character value is less than the hyphen and the end character value is equal to or greater than the hyphen. The following two regular expressions meet this requirement:

[!--]
[!-~]

To find all characters that are not in the list or range, place the caret (^) at the beginning of the list. If the caret appears anywhere else in the list, it matches itself. The following regular expression matches any number and character other than 1, 2, 3, 4, or 5:

/Chapter [^12345]/

In the above example, the expression matches any number and character other than 1, 2, 3, 4, or 5 in the ninth position. So, for example, Chapter 7 is a match and Chapter 9 is a match.

The above expression can be represented by a hyphen (-):

/Chapter [^1-5]/

A typical use of square bracket expressions is to specify any matching of uppercase or lowercase letters or any number. The following expression specifies such a match:

/[A-Za-z0-9]/

13.6.4. Replace and group ¶

Replace with the| character to allow you to choose between two or more alternatives. For example, you can extend the regular expression of a chapter title to return a wider range of matches than the chapter title. However, this is not as simple as you might think. Replace the expression that matches the largest on either side of the character.

You might think that the following expression matches a Chapter or Section that appears at the beginning and end of a line, followed by one or two numbers:

/^Chapter|Section [1-9][0-9]{0,1}$/

Unfortunately, the above regular expression either matches the word Chapter at the beginning of the line, or matches the word Section at the end of the line and any number that follows it. If the input string is Chapter 22, then the above expression matches only the word Chapter. If the input string is Section 22, the expression matches Section 22.

To make regular expressions easier to control, you can use parentheses to limit the scope of substitution, that is, to ensure that it applies only to two words, Chapter and Section. However, parentheses are also used to create subexpressions and possibly capture them for later use, as described in the section on backreferences. You can make the regular expression match Chapter 1 or Section 3 by adding parentheses in the appropriate place in the regular expression above.

The following regular expression uses parentheses to combine Chapter and Section so that the expression works correctly:

/^(Chapter|Section) [1-9][0-9]{0,1}$/

Although these expressions work properly, the parentheses around Chapter | Section will also capture either of the two matching words for later use. Because there is only one set of parentheses in the above expression, there is only one captured “child match”.

In the above example, you only need to use parentheses to combine the choice between the words Chapter and Section. To prevent matches from being saved for future use, place?: before the regular expression pattern in parentheses. The following modifications provide the same capabilities without saving submatches:

/^(?:Chapter|Section) [1-9][0-9]{0,1}$/

In addition to?: metacharacters, two other non-capture metacharacters create something called a “prediction first” match. Forward prediction is specified first, which matches the search string that matches the starting point of the regular expression pattern in parentheses. Reverse prediction is used first?! Specifies that it matches the search string at the starting point of a string that does not match the regular expression pattern.

For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that you need to update the document to change all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expression, which is an example of forward prediction precedence, matches Windows 95, Windows 98, and Windows NT:

/Windows(?=95 |98 |NT )/

Once a match is found, the next match is searched immediately after the matching text (excluding the characters in the prediction lead). For example, if the above expression matches Windows 98, the search will continue after Windows rather than 98.

13.6.5. Other exampl ¶

Here are some examples of regular expressions:

Regular expression	Description
/b([a-z]+) \1b/gi	The position in which a word appears in succession.
/(w+)://([^/:]+)(:d)?([^# ])/	Matching a URL resolves to protocols, domains, ports, and relative paths.
/^(?:Chapter\|Section) [1-9][0-9]{0,1}$/	Locate the location of the chapter.
/ [-a-z] /	There are 26 letters a to z plus a-sign.
/ terb /	Can match chapter, but not terminal.
/Bapt/	Can match chapter, but not aptitude.
/ Windows (? = 95\| 98\| NT) /	You can match Windows95 or Windows98 or WindowsNT, and when a match is found, the next search match starts after Windows.
/ ^sstories /	Matches a blank line.
/d {2} -d {5} /	Verify the ID number consisting of two digits, a hyphen, and five digits.
< [a-zA-Z] +. ?>([sS] ?)	Matches the HTML tag.

Regular expression	Description
`hello`	Match {hello}
`gray\|grey`	Match {gray, grey}
`gr(a\|e)y`	Match {gray, grey}
`gr[ae]y`	Match {gray, grey}
`b[aeiou]bble`	Match {babble, bebble, bibble, bobble, bubble}
`[b-chm-pP]at\|ot`	Match {bat, cat, hat, mat, nat, oat, pat, Pat, ot}
`colou?r`	Match {color, colour}
`rege(x(es)?\|xps?)`	Match {regex, regexes, regexp, regexps}
`go*gle`	Match {ggle, gogle, google, gooogle, goooogle,…}
`go+gle`	Match {gogle, google, gooogle, goooogle,…}
`g(oog)+le`	Match {google, googoogle, googoogoogle, googoogoogoogle,…}
`z{3}`	Match {zzz}
`z{3,6}`	Match {zzz, zzzz, zzzzz, zzzzzz}
`z{3,}`	Match {zzz, zzzz, zzzzz,…}
`[Bb]rainf\\k`	匹配 {Brainfk, brainfk}
`\d`	Match {0pence 1, 2, 3, 4, 5, 6, 7, 8, 9.
`1\d{10}`	Matches 11 digits, starting with 1
`[2-9]\|[12]\d\|3[0-6]`	Match integers in the range of 2 to 36
`Hello\nworld`	Match Hello followed by a newline character, followed by world
`\d+(\.\d\d)?`	Contains a positive integer or a floating-point number that contains two decimal places.
`[^*@#]`	Exclude `*` Three characteristic symbols: @ and #
`//[^\r\n]*[\r\n]`	Match? /? Comments at the beginning
`^dog`	The match starts with “dog”
`dog$`	The match ends with “dog”
`^dog$`	Is exactly “dog”