Although the answer to strip all whitespace is neat, it doesn't really solve the problem that's posed, which is to find a regex. Take, for instance, my test script that downloads a web page and extracts all phone numbers using the regex. Since you'd need a regex anyway, you might as well have the regex do all the work. I came up with this:
1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?
Here's a perl script to test it. When you match, $1 contains the area code, $2 and $3 contain the phone number, and $5 contains the extension. My test script downloads a file from the internet and prints all the phone numbers in it.
#!/usr/bin/perl
my $us_phone_regex =
'1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?';
my @tests =
(
"1-234-567-8901",
"1-234-567-8901 x1234",
"1-234-567-8901 ext1234",
"1 (234) 567-8901",
"1.234.567.8901",
"1/234/567/8901",
"12345678901",
"not a phone number"
);
foreach my $num (@tests)
{
if( $num =~ m/$us_phone_regex/ )
{
print "match [$1-$2-$3]\n" if not defined $4;
print "match [$1-$2-$3 $5]\n" if defined $4;
}
else
{
print "no match [$num]\n";
}
}
#
# Extract all phone numbers from an arbitrary file.
#
my $external_filename =
'http://web.textfiles.com/ezines/PHREAKSANDGEEKS/PnG-spring05.txt';
my @external_file = `curl $external_filename`;
foreach my $line (@external_file)
{
if( $line =~ m/$us_phone_regex/ )
{
print "match $1 $2 $3\n";
}
}
Edit:
You can change \W* to \s*\W?\s* in the regex to tighten it up a bit. I wasn't thinking of the regex in terms of, say, validating user input on a form when I wrote it, but this change makes it possible to use the regex for that purpose.
'1?\s*\W?\s*([2-9][0-8][0-9])\s*\W?\s*([2-9][0-9]{2})\s*\W?\s*([0-9]{4})(\se?x?t?(\d*))?';