tags:

views:

3553

answers:

6

I have a database with millions of phone numbers with free-for-all formatting. Ie, the UI does not enforce any constraints and the users are typing in whatever they want.

What I'm looking for is a Java API that can make a best-effort to convert these into a consistent format. Ideally, the API would take the free text value and a country code and produce a valid international phone number or throw an exception.

For example, a phone number in the system might look like any of the following:

(555) 478-1123
555-478-1123    
555.478.1123
5554781123

Given the country of US, the API would produce the value "+1 (555) 478-1123" for all these. The exact format does not matter, as long as it's consistent.

There are also numbers in the system without area codes, such as "478-1123". In that case, I would expect a NoAreaCodeException, or something similar.

There could also be data such as "abc", which should also throw exceptions.

Of course, there are countless variations of the examples I have posted, as well as the enormous complication of international phone numbers, which have quite complicated validation rules. This is why I would not consider rolling my own.

Has anyone seen such an API?

+3  A: 

I don't know of such an API but it looks like could be done by using regular expressions. Probably you can't convert all numbers to a valid format but most of them.

Olvagor
Was my answer too.
Ikke
Here is what scared me away from trying to roll my own. Take a look at the phone number rules for just Australia: http://en.wikipedia.org/wiki/%2B61
Chase Seibert
+1  A: 

There are commercial programs that format and validate international telephone numbers, like this one which even checks for valid area codes in some countries. For North America, the NANPA provides some resources for validating area codes.

Zach Scrivena
That PERL API is the best I have seen so far. I would not personally use it because it's commercial and non-Java, but it would be excellent for some projects.
Chase Seibert
+6  A: 

You could write your own (for US phone # format):

  • Strip any non-numeric characters from the string
  • Check that the remaining string is ten characters long
  • Put parentheses around the first three characters and a dash between the sixth and seventh character.
  • Prepend "+1 " to the string
Bill the Lizard
That would work for US phone numbers. I was hoping for a generic international solution as well.
Chase Seibert
I understand. You'd have to implement a separate format method for each country you're interested in that uses a different phone number format.
Bill the Lizard
This is basically what I ended up doing. You download some sample code for US and UK phone numbers from my blog: http://bitkickers.blogspot.com/2009/02/java-phone-number-format-api.html
Chase Seibert
@Chase: Thanks for posting that.
Bill the Lizard
A: 

I don't think there is a way of recognizing the lack of an area code unless your numbers are all from one country (presumably the USA), as each country has its own rules for things like area codes.

I'd start looking for detailed information here, here, and here - if there are APIs to handle it (in Java or otherwise), they might be linked to there as well.

Michael Borgwardt
Yeah, this is why I was thinking the API should take a country code parameter.
Chase Seibert
A: 

The best i found was javax.telephony, to be found here: http://java.sun.com/products/javaphone/

It has an Address class, but sadly that class did not solve your problem :( Well, maybe you can find a solution by digging deeper into it.

Apart of that, my first idea was to use regex. However, that seems to be a kind of bad solution to this specific problem.

mafutrct
This would be a cool piece of functionality to include in the Java Phone API spec, but I agree it does not do this right now.
Chase Seibert
A: 

You could try this Java phone number formatting library http://code.google.com/p/libphonenumber/

It has data for hundreds of countries and formats.

g1smd