13.5. Regular expressions-matching rules

13.5.1. Basic pattern matching ¶

It all starts with the basics. Patterns, the most basic elements of regular expressions, are a set of characters that describe the characteristics of a string. For example:

^once

This pattern contains a special character ^, indicating that the pattern matches only those that use the once The beginning of the string. For example, the pattern and the string “once upon a time” Match, with “There once was a man from NewYork” It doesn’t match. Just as the ^ symbol indicates the beginning, the $symbol is used to match strings that end in a given pattern.

bucket$

This model is similar to that of “Who kept all of this cash in a bucket” Match, with “buckets” It doesn’t match. When the characters ^ and $are used at the same time, it indicates an exact match (the string is the same as the pattern). For example:

^bucket$

Only match strings “bucket” . If a pattern does not include ^ and $, it matches any string that contains the pattern. For example, mode:

once

And string

There once was a man from NewYork
Who kept all of his cash in a bucket.

It’s a match.

Letters in this pattern (o-n-c-e) Are literal characters, that is, they mean that the letter itself, the number is the same. Other slightly more complex characters, such as punctuation and white characters (spaces, tabs, and so on), use escape sequences. All escape sequences start with a backslash. The escape sequence of tabs ist. So if we want to detect whether a string begins with a tab, we can use this mode:

^\t

类似的，用 \n 表示 “新行” ，\r 表示回车。其他的特殊符号，可以用在前面加上反斜杠，如反斜杠本身用 \\ 表示，句号 . 用 \. 表示，以此类推。

13.5.2. Character cluster ¶

In INTERNET programs, regular expressions are usually used to validate the user’s input. When a user submits a FORM, it is not enough to use ordinary literal characters to determine whether the entered phone number, address, EMAIL address, credit card number and so on are valid.

So use a freer way to describe the pattern we want, which is a character cluster. To create a character cluster that represents all vowel characters, place all vowel characters in square brackets:

[AaEeIiOoUu]

This pattern matches any vowel character, but can only represent one character. A hyphen can be used to indicate a range of characters, such as:

[a-z] // 匹配所有的小写字母
[A-Z] // 匹配所有的大写字母
[a-zA-Z] // 匹配所有的字母
[0-9] // 匹配所有的数字
[0-9\.\-] // 匹配所有的数字，句号和减号
[ \f\r\t\n] // 匹配所有的白字符

Again, these represent only one character, which is a very important one. If you want to match a string consisting of a lowercase letter and a number, such as “Z2”, “T6”, or “G7”, but not “ab2”, “r2d3”, or “b52”, use this mode:

^[a-z][0-9]$

Although [a-z] Represents a range of 26 letters, but here it can only match a string whose first character is a lowercase letter.

It was mentioned earlier that ^ represents the beginning of a string, but it has another meaning. When ^ is used in a set of square brackets, it means “ 非 “or” 排除 The meaning of “is often used to remove a character. Also using the previous example, we require that the first character cannot be a number:

^[^0-9][0-9]$

This pattern matches “& 5”, “G7” and “- 2”, but does not match “12” and “66”. Here are a few examples of excluding specific characters:

[^a-z] //除了小写字母以外的所有字符
[^\\\/\^] //除了(\)(/)(^)之外的所有字符
[^\"\'] //除了双引号(")和单引号(')之外的所有字符

Special characters. (dot, full stop) used in regular expressions to represent all characters except “new lines”. So the pattern ^ .5$ matches any two-character string that ends with the number 5 and begins with other non-“new line” characters. Mode. Can match any string 换行符（n、r）除外 .

PHP’s regular expressions have some built-in universal character clusters, which are listed below:

Character cluster	Description
[[:alpha:] ]	Any letter
[[:digit:] ]	Any number
[[:alnum:] ]	Any letter or number
[[:space:] ]	Any white space character
[[:upper:] ]	Any uppercase letter
[[:lower:] ]	Any lowercase letter
[[:punct:] ]	Any punctuation mark
[[:xdigit:] ]	Any hexadecimal number is equivalent to [0-9a-fA-F]

13.5.3. Determine recurrence ¶

By now, you already know how to match a letter or number, but more often, you may want to match a word or set of numbers. A word consists of several letters and a set of numbers consists of several singular numbers. Curly braces ({}) followed by a character or character cluster are used to determine the number of repetitions of the preceding content.

Character cluster	Description
`^[a-zA-Z_]$`	All the letters and underscores
`^[[:alpha:]]{3}$`	All the three-letter words
`^a$`	Letter a
`^a{4}$`	Aaaa
`^a{2,4}$`	Aa,aaa or aaaa
`^a{1,3}$`	A Magi AA or aaa
`^a{2,}$`	A string containing more than two a
`^a{2,}`	For example, aardvark and aaab, but not apple
`a{2,}`	For example, baad and aaa, but not Nantucket
`\t{2}`	Two tabs
`.{2}`	All two characters

这些例子描述了花括号的三种不同的用法。一个数字 {x}的意思是 前面的字符或字符簇只出现x次 ；一个数字加逗号 {x,} 的意思是 前面的内容出现x或更多的次数 ；两个数字用逗号分隔的数字 {x,y} 表示 前面的内容至少出现x次，但不超过y次 。我们可以把模式扩展到更多的单词或数字：

^[a-zA-Z0-9_]{1,}$      // 所有包含一个以上的字母、数字或下划线的字符串
^[1-9][0-9]{0,}$        // 所有的正整数
^\-{0,1}[0-9]{1,}$      // 所有的整数
^[-]?[0-9]+\.?[0-9]+$   // 所有的浮点数

The last example is not easy to understand, is it? Look at it this way: with an optional minus sign ( [-] ?) The beginning (^), followed by one or more numbers ( [0-9] +), and a decimal point (.) Follow one or more numbers ( [0-9] +), and there is nothing else after it ($). Below you will know the simpler methods that can be used.

特殊字符 ? 与 {0,1} 是相等的，它们都代表着： 0个或1个前面的内容 或 前面的内容是可选的 。所以刚才的例子可以简化为：

^\-?[0-9]{1,}\.?[0-9]{1,}$

Special character* 与 {0,} 是相等的，它们都代表着 * 0 个或多个前面的内容 * 。最后，字符 + 与 {1,} 是相等的，表示 * 1 个或多个前面的内容 * , so the above four examples can be written as follows:

^[a-zA-Z0-9_]+$      // 所有包含一个以上的字母、数字或下划线的字符串
^[1-9][0-9]*$        // 所有的正整数
^\-?[0-9]+$          // 所有的整数
^[-]?[0-9]+(\.[0-9]+)?$ // 所有的浮点数

Of course, this does not technically reduce the complexity of regular expressions, but it can make them easier to read.