tags:

views:

347

answers:

5

Hi,

I would like to get the phone numbers from a file. I know the numbers have different forms, I can handle for a single one, but don't know how to get a uniform regex. For example

  1. xxx-xxx-xxxx

  2. (xxx)xxx-xxxx

  3. xxx xxx xxxx

  4. xxxxxxxxxx

I can only handle 1, 2, and 4 together

grep '[0-9]\{3\}[ -]\?[0-9]\{3\}[ -]\?[0-9]\{4\}' file

Is there any one single regex can handle all of these four forms?

+2  A: 

You can just OR (|) your regexes together -- will be more readable that way too!

scrible
can you show me an example? I know OR(|) might work, but I didn't figure out how.
skydoor
+1  A: 

My first thought is that you may find it easier to see if your candidate number matches against one of four regular expressions. That will be easier to develop/debug, especially as/when you have to handle additional formats in the future.

Brian Agnew
A: 

I got this:

debian:tmp$ cat p.txt
333-444-5555
(333)333-6666
123 456 7890
1234567890
debian:tmp$ egrep '\(?[0-9]{3}[ )-]?[0-9]{3}[ -]?[0-9]{4}' p.txt
333-444-5555
(333)333-6666
123 456 7890
1234567890
debian:tmp$ egrep --version
GNU grep 2.5.3

Copyright (C) 1988, 1992-2002, 2004, 2005  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

debian:tmp$
Segfault
This will match (333-444-555 also.
Joel
yes, so it will.
Segfault
I tried this but I don't think it is correct.
skydoor
A: 
grep -P '[0-9]{3}-[0-9]{3}-[0-9]{3}|[0-9]{3}\ [0-9]{3}\ [0-9]{3}|[0-9]{9}|\([0-9]{3}\)[0-9]{3}-[0-9]{3}'
D W
+1  A: 
grep '\(([0-9]\{3\})\|[0-9]\{3\}\)[ -]\?[0-9]\{3\}[ -]\?[0-9]\{4\}' file

Explanation:

([0-9]\{3\}) three digits inside parentheses

\| or

[0-9]\{3\} three digits not inside parens

...with grouping parentheses - \(...\) - around the alternation so the rest of the regex behaves the same no matter which alternative matches.

Alan Moore