Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9]
(the ~
on each end is a delimiter). [...]
in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9
means all characters between 0
and 9
, inclusive).
You can think of a character class like a shorthand for a big OR
condition: ie. [0-9]
is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9
. Note that classes don't have to contain ranges, either -- [aeiou]
is a character class that matches a or e or i or o or u
(or in other words, any vowel).
When the first character in the class is ^
, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0
and 9
) from $phone
.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone
against a second expression, and puts the results into an array called $matches
, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?:
(ie. (?:...)
).
Each group does a similar thing: [0-9]{3}
or [0-9]{4}
. As we saw above, [0-9]
defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^
, these are not negated groups). The {3}
or {4}
is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3}
will match exactly three digits in a row, and [0-9]{4}
will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111
), because the character class is evaluate for each repetition (so 123
will match because 1 matches [0-9]
, then 2 matches [0-9]
, and then 3 matches [0-9]
).