views:

369

answers:

4

I want to extract valid(on the basis of format) mobile numbers from a text.

e.g. I/O some text (987) 456 7890, (987)-456-7890 again some text

O/P 9874567890 9874567890

problem is, there are many valid mobile formats in all over world like.

text = "Denmark 11 11 11 11, 1111 1111 "
        // + "Germany 03333 123456, +49 (3333) 123456 "
        // + "Netherlands + 31 44 12345678 Russia +7(555)123-123 "
        // + "spain 12-123-12-12 switzerland +41 11 222 22 22 "
        // + "Uk (01222) 333333 India +91-12345-12345 "
        // + "Austrailia (04) 1231 1231 USA (011) 154-123-4567 "
        // + "China 1234 5678 France    01-23-45-67-89 "
        // + "Poland (12) 345 67 89 Singapore 123 4567 "
        // + "Thailand  (01) 234-5678, (012) 34-5678 "
        // + "United Kingdom 0123 456 7890, 01234 567890 "
        // + "United States (987) 456 7890, (987)-456-7890+ etc."
  1. How to cover all mobile formats?
  2. min and max length of the mobile numbers(with or without country code)?
  3. how to recognize that mobile number has country code or not?
+1  A: 

You might want to check if this fits your needs: A comprehensive regex for phone number validation

paul_sns
There is no built-in capabilities to process Regular Expressions in Blackberry.
Yeti
my problem still not solved completely because mobile format has manyvariations and a similar topic is also well discussed. so i am going to accept your answer.
Vivart
+1  A: 

By experience I know how this works in my phone OS. It looks at a long enough sequences of digits, separated by a set of allowed chars.

In principle something like:

[\+]?([0-9]|[\(\).- ]){min,max}

This regex is suboptimal since it also looks for long sequences of separator chars. You will probably need to filter those results out as well.

A very simple method with some false positives, but false positives are IMPO better than misses.

disown
+1  A: 

You shouldn't use the list of samples you got as a guide to actual mobile phone numbers. For example the number sequence shown for the Netherlands is incorrect, in that it doesn't cover just mobile numbers but ALL regular phone numbers (it doesn't cover such things as 0800 and 0900 numbers for which different rules apply) and is missing an element even for that. I can only assume the list is similarly incorrect for other countries (and of course it's far from complete in that it doesn't cover all countries, but maybe you posted only a fragment).

To parse a phone number you'd have to first remove all white space and other formatting characters from what could be a phone number, then check whether it has the correct length to be one, then try to deduce whether it includes a country code or not. If it includes a country code but doesn't start with either 00 or + (both are used to indicate an international number) it might not be a phone number after all. Does it include an area code? If so, is the area code one associated with mobile phones (for example in the Netherlands all mobile phone numbers have area code 06, BUT in the past this wasn't always the case so if you have an old document a 06 area code may not be a mobile number anyway. After you've deduced that (and AFAIK mobile numbers always include an area code) you have to check if the remaining numbers make up something that could be an actual phone number without area code based on the length of the number (hint: area code + numer together have to be 10 long here, and I think everywhere).

And all that while taking into consideration that the rules may well be different for different countries or even different networks within some countries.

And of course if you find a number that looks like a valid phone number it still may not be. It could be some other number that just looks like a phone number but isn't.

jwenting
His list is wrong for Switzerland too. To my mind, detecting phone numbers reliably is difficult enough. Detecting the country code too. Detecting mobile numbers, when the prefixes are liable to change at any time, seems pretty impossible - or fragile at best.
Benjol
through that list i just wanted to give a hint that what i want.instead of writing some garbage text i have given country name.
Vivart
+1  A: 

Simple search of all matching string formats in this case is not right way. The optimal way is using Regular Expressions to find all matches of phone numbers, but Blackberry java don't have built-in capabilities to process Regular Expressions.

But you can use 3-rd party library for J2ME implementing RegEx processing, smth. like this.

Yeti
thanks yeti you have solved my another problem.I was also searching for 3rd party RegEx lib.
Vivart