views:

414

answers:

4

Hey Everyone,

I have a simple problem I have a phone like this: +1 (123) 123-1234 and I want to just take the numbers out of that string using regex. Any help will be greatly appreciated.

Thanks

+12  A: 

This will strip out an non-numeric characters:

string input = "+1 (123) 123-1234";
string digits = Regex.Replace(input,@"\D",string.Empty);
Chris Pebble
Won't that remove all the digits?
C. Ross
Thanks chris that worked out quite well.
\D (capital D) means non-digits\d (lower case d) means digits
Erv Walter
Only thing I'd change is Regex.Replace(input, @"\D", string.Empty), but I'm anal like that.
Jarrett Meyer
@Jerret Meyer, good point, that does clarify the intention. Code updated.
Chris Pebble
A: 

Why not just do a replace?

string phoneNumber = "+1 (123) 123-1234";
phoneNumber = phoneNumber.Replace("+", "");
phoneNumber = phoneNumber.Replace("(", "");
phoneNumber = phoneNumber.Replace(")", "");
phoneNumber = phoneNumber.Replace("-", "");
phoneNumber = phoneNumber.Replace(" ", "");
C. Ross
Because that way you've just created 6 strings. http://bit.ly/fi0Rq A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification. If it is necessary to modify the actual contents of a string-like object, use the System.Text.StringBuilder class.
Zhaph - Ben Duguid
And? Premature optimization (except in the 2% of cases) ... you know the deal. Honestly why is a simple (if crude answer) worth 2 down votes?
C. Ross
+1: it's not. It's correct, and the 'more ideal' solution will float up higher.
SnOrfus
You still won't handle any other out-of-place characters (such as [",<,>,#,etc...]). It is not simple (you have to do a Replace for every Non-Digit character to be correct), which makes it needlessly verbose. At least put the characters to ignore in a list and iterate over the list. At that point, why not just iterate over the string and append the characters that *are* digits to a result string?
Erich Mirabal
@C.Ross, indeed I do, and I didn't down vote, just supplied a comment. To be fair, the OP asked for a RegEx, and to quote Raymond Chen, "you have a problem, and you try and solve it with a RegEx, now you have two problems".
Zhaph - Ben Duguid
+2  A: 
string digits = Regex.Replace(input, @"[^\d]", String.Empty);
George Mauer
\D (capital D) means non-digits
Erv Walter
oh, learn something new every day. I'll take that out then
George Mauer
+5  A: 

Using RegEx is one solution. Another way would be to use LINQ (provided you are using .Net 3.5)

    string myPhone = "+1 (123) 123-1234";
    string StrippedPhone = new string((from c in myPhone
                                       where Char.IsDigit(c)
                                       select c).ToArray());

The end result is the same, but I think LINQ offers some advantages over RegEx in this case. First, readability. The RegEx requires you to know that "D" means Non digit (compared to Char.IsDigit())- there is confusion about that already in the comments here. Also, I did a very simple benchmark, performing each method 100,000 times.

LINQ: 127ms

RegEx: 485ms

So, at a quick glance, it seems like LINQ out performs Regex in this situation. And, I'd argue it is more readable.

    int i;
    int TIMES = 100000;
    Stopwatch sw = new Stopwatch();
    string myPhone = "+1 (123) 123-1234";

    // Using LINQ            
    sw.Start();
    for (i = 0; i < TIMES; i++)
    {
        string StrippedPhone = new string((from c in myPhone
                                           where Char.IsDigit(c)
                                           select c).ToArray());
    }
    sw.Stop();
    Console.WriteLine("Linq took {0}ms", sw.ElapsedMilliseconds);

    // Reset 
    sw.Reset();

    // Using RegEx
    sw.Start();
    for (i = 0; i < TIMES; i++)
    {
        string digits = Regex.Replace(myPhone, @"\D", string.Empty);
    }
    sw.Stop();
    Console.WriteLine("RegEx took {0}ms", sw.ElapsedMilliseconds);

    Console.ReadLine();
Rob P.
True, and +1 for that, it's quite nice and elegant, but the OP may not be able to use .NET 3.5 at this point ;)
Zhaph - Ben Duguid
And that's a good point to - I forget that not everyone is using 3.5; I'll edit to reflect that.
Rob P.
also, just to further emphasize your point, "D" means "NON-digit", not digit.. so yeah, Char.IsDigit() seems more readable to me too.The regex solution is concise though. :)
Scott Ferguson
Wow, impressive. Regex performance can be improved using RegexOptions.Compiled, but running a quick test shows that even then it's only marginally faster then LINQ. I learned something today. Thanks Rob P!
Chris Pebble