tags:

views:

92

answers:

7

Probably an easy regex question.

How do I remove all non-digts except leading + from a phone number?

i.e.

012-3456 => 0123456
+1 (234) 56789 => +123456789

+1  A: 

If global regular expressions are supported you could simply replace all characters that are not a digit or plus symbol:

s/[^0-9+]//g

If global regular expressions are not supported you could match as many possible number groups as might be valid in your given phone number format:

s/([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)/\1\2\3\4/
Trey
This doesn't handle the + at the front correctly.
Marcelo Cantos
Yes it does - you could make a case against it that it won't remove a `+` *inside* the string, but that seems unlikely for a phone number.
Tim Pietzcker
@Marcelo, there is + in the regex, please double check it
S.Mark
@Marcelo: how does this not handle the leading +? The leading + is supposed to be included. Can a phone number have +'s throughout?
Trey
The question specifically indicated a + at the front. But I take the point that this is unlikely, so I'll remove the -1.
Marcelo Cantos
A: 

Just replace everything except digits and + to ''

/[^\d+]/

In Python,

>>> import re
>>> re.sub("[^\d+]","","+1 (234) 56789")
'+123456789'
>>>
S.Mark
This doesn't handle the + at the front correctly.
Marcelo Cantos
@Marcelo, there is + in the regex, please double check it.
S.Mark
The question specifically indicated a + at the front. But I take the point that this is unlikely, so I'll remove the -1.
Marcelo Cantos
@Marcelo: the `+` inside a character class definition is not a metacharacter.
polygenelubricants
I know, @polygenelubricants. My original point was that the OP specified that a `+` at the front should be kept, not just any `+`.
Marcelo Cantos
+2  A: 

In Java, you can do

public static String trimmed(String phoneNumber) {
   return phoneNumber.replaceAll("[^+\\d]", "");
}

This will keep all +, even if it's in the middle of phoneNumber. If you want to remove any + in the middle, then do something like this:

return phoneNumber.replaceAll("[^+\\d]|(?<=.)\\+", "");

(?<=.) is a lookbehind to see if there was a preceding character before the +.

System.out.println("[" + trimmed("+1 (234)++56789 ") + "]");
// prints "[+123456789]"
polygenelubricants
A: 

use perl,

my $number = // set it equal to phone number
$number =~ s/[^\d+]//g

This will still allow for a plus sign to be anywhere, if you want it to only allow a plus sign in the beginning, I will leave that part up to you. You can't just have the entire answer given to you or else you won't learn.

Essentially what that does now, is it will replace anything in $number that is not a digit or a plus sign with an empty string

Silmaril89
A: 

You cannot simply remove the '+' symbol. It has to be treated like '00' and belongs to the country code. '+xx' is the same as '00xx'.

Anyway, handling phone numbers with regex is like parsing html with regex...nearly impossible because there are so many (correct) spelling formats.

My advice would be be to write a custom class for handling phone numbers and not to use regex.

Alexander
+5  A: 
/(?<!^)\+|[^\d+]+//g

will remove all non-numbers and leave a leading + alone. Note that leading whitespace will cause the "leave + alone" bit to fail. In .NET languages, this can be worked into the regex, in others you should strip whitespace first before passing the string to this regex.

Explanation:

(?<!^)\+: Match a + unless it's at the start of the string. (In .NET, use (?<!^\s*)\+ to allow for leading whitespace).

| or

[^\d+]+: match any run of characters that are neither numbers nor +.

Before (using (?<!^\s*)\+|[^\d+]+):

+49 (123) 234 5678
  +1 (555) 234-5678
+7 (23) 45/6789+10
(0123) 345/5678, ext. 666

After:

+491232345678
+15552345678
+72345678910
01233455678666
Tim Pietzcker
Works perfect. Thanks for the detailed breakdown
adrianm
Clever. `(?<!^)\+` is that like a negated look-ahead or something?
Mark
@Mark: It's a negative lookbehind. `(?<!^)` means "Assert that it is impossible to match the beginning of the string at the current position", and only then `\+` match a plus sign.
Tim Pietzcker
Right...that makes more sense :)
Mark
FYI, Python is one such language that only supports fixed-width look-behinds. i.e., it doesn't support `(?<!^\s*)`
Mark
+1  A: 

It is certainly possible to do that all in one regex, but I prefer simpler regexs that will deal with the leading plus correctly and the leading and trailing whitespace:

#!/usr/bin/perl 
while (<DATA>) {
    print "DATA Read: \$_=$_";  #\n already there...
    s/\s*(.*)\s*/$1/g;
    $s=s/(^\+){0,1}//?$1:'';
    s/[^\d]//g;
    print "Formatted: $s$_\n====\n";
 }


 __DATA__
 012-3456
 +1 (234) 56789
          +1 (234) 56789
 1234-56789        |
 +12345+6789

Output:

DATA Read: $_=012-3456
Formatted: 0123456
====
DATA Read: $_=+1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=         +1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=1234-56789        |
Formatted: 123456789
====
DATA Read: $_=+12345+6789
Formatted: +123456789
drewk