Probably an easy regex question.
How do I remove all non-digts except leading + from a phone number?
i.e.
012-3456 => 0123456
+1 (234) 56789 => +123456789
Probably an easy regex question.
How do I remove all non-digts except leading + from a phone number?
i.e.
012-3456 => 0123456
+1 (234) 56789 => +123456789
If global regular expressions are supported you could simply replace all characters that are not a digit or plus symbol:
s/[^0-9+]//g
If global regular expressions are not supported you could match as many possible number groups as might be valid in your given phone number format:
s/([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)/\1\2\3\4/
Just replace everything except digits and + to ''
/[^\d+]/
In Python,
>>> import re
>>> re.sub("[^\d+]","","+1 (234) 56789")
'+123456789'
>>>
In Java, you can do
public static String trimmed(String phoneNumber) {
return phoneNumber.replaceAll("[^+\\d]", "");
}
This will keep all +
, even if it's in the middle of phoneNumber
. If you want to remove any +
in the middle, then do something like this:
return phoneNumber.replaceAll("[^+\\d]|(?<=.)\\+", "");
(?<=.)
is a lookbehind to see if there was a preceding character before the +
.
System.out.println("[" + trimmed("+1 (234)++56789 ") + "]");
// prints "[+123456789]"
use perl,
my $number = // set it equal to phone number
$number =~ s/[^\d+]//g
This will still allow for a plus sign to be anywhere, if you want it to only allow a plus sign in the beginning, I will leave that part up to you. You can't just have the entire answer given to you or else you won't learn.
Essentially what that does now, is it will replace anything in $number that is not a digit or a plus sign with an empty string
You cannot simply remove the '+' symbol. It has to be treated like '00' and belongs to the country code. '+xx' is the same as '00xx'.
Anyway, handling phone numbers with regex is like parsing html with regex...nearly impossible because there are so many (correct) spelling formats.
My advice would be be to write a custom class for handling phone numbers and not to use regex.
/(?<!^)\+|[^\d+]+//g
will remove all non-numbers and leave a leading +
alone. Note that leading whitespace will cause the "leave +
alone" bit to fail. In .NET languages, this can be worked into the regex, in others you should strip whitespace first before passing the string to this regex.
Explanation:
(?<!^)\+
: Match a +
unless it's at the start of the string. (In .NET, use (?<!^\s*)\+
to allow for leading whitespace).
|
or
[^\d+]+
: match any run of characters that are neither numbers nor +
.
Before (using (?<!^\s*)\+|[^\d+]+
):
+49 (123) 234 5678
+1 (555) 234-5678
+7 (23) 45/6789+10
(0123) 345/5678, ext. 666
After:
+491232345678
+15552345678
+72345678910
01233455678666
It is certainly possible to do that all in one regex, but I prefer simpler regexs that will deal with the leading plus correctly and the leading and trailing whitespace:
#!/usr/bin/perl
while (<DATA>) {
print "DATA Read: \$_=$_"; #\n already there...
s/\s*(.*)\s*/$1/g;
$s=s/(^\+){0,1}//?$1:'';
s/[^\d]//g;
print "Formatted: $s$_\n====\n";
}
__DATA__
012-3456
+1 (234) 56789
+1 (234) 56789
1234-56789 |
+12345+6789
Output:
DATA Read: $_=012-3456
Formatted: 0123456
====
DATA Read: $_=+1 (234) 56789
Formatted: +123456789
====
DATA Read: $_= +1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=1234-56789 |
Formatted: 123456789
====
DATA Read: $_=+12345+6789
Formatted: +123456789