Regular expressions
In this PHP Tutorials you will learn about Regular Expressions viz Basic PCRE Syntax, Character classes, preg_match(), preg_match() and Extracting data with regular expressions.
Basic PCRE Syntax:
A regular expression pattern is a string consisting of plain text and pattern metacharacters.
The regexp metacharacters define the type and number of characters that can match part of a pattern.
Character classes allow a pattern to match multiple characters simultaneously.
Character classes are:
d Digits 0–9
D Anything not a digit
w Any alphanumeric character or an underscore (_)
W Anything not an alphanumeric character or an underscore
s Any whitespace
S Any nonwhitespace character
. Any character except for a newline
Character class metacharacters match a single character.
You need to be able to specify how many times they must match.
To do this, use the enumeration operators.
The enumeration operators are:
? 0 or 1 time
* 0 or more times
+ 1 or more times
{,n} at most n times
{m,} m or more times
{m,n} between m and n times
Example:
if (preg_match(“/d{5}-d{4}/”, $subject)) {
// matches a ZIP+4
}
preg_match() takes two arguments, the first argument is the pattern, and the second argument is the subject string.
The pattern is enclosed in forward slashes.
preg_match() will match anywhere it can in the subject string.
If you want to specify that the pattern must start matching immediately at the beginning of the subject, you should use the positional anchor ^.
You can match the end of a string with the positional anchor $.
Example:
if (preg_match(“/^d{5}-d{4}$/”, $subject)) {
// matches a ZIP+4 exactly
}
You can create your own character classes by enclosing the desired characters in brackets
([]).
Ranges are allowed, to create a character class that matches only the digits 2 through 9, you can use [2-9].
Patterns can have aspects of their base behavior changed by appending modifiers after
the closing delimiter.
Some common pattern modifiers are:
i Matches not case sensitively
m Enables positional anchors to match at any newline in a subject string
s Enables . to match newlines
x Enables comments and whitespace in regexps
u Treats data as UTF-8
Extracting data with regular expressions:
To capture pieces of patterns, you must group the portions of the pattern you want to capture with parentheses.
Example:
To capture the two components of a ZIP+4 code into separate matches, you need to group them individually into subpatterns as follows:
/(d{5})-(d{4})/
After you’ve specified your capture subpatterns, you can read their matches by passing an
array as the third parameter to preg_match().
The subpattern matches will be stored in the match array by their pattern number, which is determined by numbering the subpatterns left-to-right by the position of their opening parenthesis.
Example:
$string = ‘My zipcode is 21797-2046’;
if(preg_match(“/(d{5})-(d{4})/”, $string, $matches)) {
print_r($matches);
}
This will print:
Array
(
[0] => 21797-2046
[1] => 21797
[2] => 2046