tags:

views:

50

answers:

2

I'm trying to exract phone numbers from a set of data. It has to be able to extract international and local numbers from each country.

The rules I've laid out for it are: 1. Look for the international symbol, indicating it's an international dialing number with a valid extension(from +1 to +999). 2. If the plus symbol is present, make sure the next following character is a number. 3. If there is none, look at the length to validate it is between 7 and 10 digits long. 4. In the event that the number is divided (correctly via international standers) by either a hyphen(-) or space make sure the amount of digits in between them are either 3 or 4

What I've got so var is:

\+(?=[1-999])(\d{4}[0-9][-\s]\d{3}[0-9][-\s]\d{4}[0-9])|(\d{7,11}[0-9])

That's for international, and the local search is\d{7,10}

The thing is, that it doesn't actually pick up numbers with spaces or hyphens in it. Can anybody give me some advice on it?

+1  A: 

\d already means "digit", so you shouldn't put another [0-9] after it (which means the same).

In the same vein, [1-999] doesn't mean what you think it does. It in fact matches one (1) digit between 1 and 9. You probably want \d{1,3} although that would also match 0.

Then, you're only allowing one variation of dividing blocks (4-3-4) - why? This is not going to match many, many valid phone numbers.

I would suggest the following:

Search your string using the regex \+?(?=\d)[\d\s-]{7,13}\b to grab anything that remotely looks like a phone number. Perhaps you also want to include parentheses and slashes in the allowed character list: \+?(?=\d)[\d\s/()-]{7,14}\b

Then process and validate those strings separately, best after removing all punctuation/whitespace (except the +).

Tim Pietzcker
A: 

I'm not sure it will be possible to create a regex to match every country - some countries have conflicting rules.

it's entirely possible to have e.g. two valid local numbers contained within 1 valid international number.

You might want to start by looking at some of the answers to this question:

http://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation

If you're looking to create something definitive for every country, good luck, and you'll probably need to spend a while with some technical standards...

i.e. both 177 and 186-0039-011-81-90-1177-1177 are valid phone numbers in the same country

Colin Pickard