views:

50

answers:

4

I have a MySQL table containing phone numbers that may be formatted in many different ways. For instance:

  • (999) 999-9999
  • 999-999-9999
  • 999.999.9999
  • 999 999 9999
  • +1 999 999 9999
  • 9999999999
  • 019999999999
  • etc.

The field is stored as a string. All I want to do is return the field with anything non-numeric removed. What SQL could be used to do this in MySQL? So the values above would all return either 9999999999, 19999999999, or 019999999999.

Note this is for a query that will not be run regularly, so I don't need to worry about optimization, indexing, or anything else. However, I do need to include this into a fairly complex join with several other tables. At the most basic level, I'm attempting to find duplicate records and want to group by or at least order by phone number.

+2  A: 

You could use the REPLACE command encapsulating the whole string with the different variations in order to remove them.

Ardman
@Ardman: I have no idea what variations will have been used, as the data has been entered as free text. I've already attempted the following, but it gets very cumbersome to use and was hoping for a simpler solution: `replace(replace(replace(replace(replace(phone,' ', ''),'.',''),'-',''),'(',''),')','')`
Tauren
there's a UDF available for regex replacments (referred to in http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql). Or you could write your own UDF with a substring and ascii code check.
potatopeelings
@potatopeelings: that looks sweet! thanks. you should add an answer to this question so I can select it.
Tauren
+1  A: 

You have to normalize these numbers before inserting them into database.
Make it stored in numbers only.
Otherwise it would be pain in the neck to get it right.

Col. Shrapnel
@Col: Yes, of course if I had control over it. But I'm dealing with legacy data and can't change it in the productions system.
Tauren
@Tauren: For something like this, you should always store just the numbers and then format it the same whenever it gets displayed. You're relying too heavily on MySQL here, this is a PHP task. I know you *think* you can't do this without MySQL, but if you really *must* do this in MySQL, then the system you're using does not fit your needs. There must be a way that you could do this in PHP.
animuson
A: 

This is the work for small regexp but unfortunately MySQL does not support regexp replace (unless you want to recompile it from sources). So I'd say do the cleaning on the code side, it'll be much easier.

vava
@vava: that's the verdict I had come to as well, but figured I'd ask the SO community to see if there were any non-code solutions I wasn't aware of. As I mention in a comment to @Ardman's answer, using multiple nested `replace(...)` commands isn't a great solution. Too bad there isn't a regex replace.
Tauren
A: 

There's a UDF available for regex replacments (referred to in http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql).

Or you could write your own UDF using a substring and ASCII code check.

potatopeelings
This is definitely the slickest solution posted that answers my specific question. However, now that I've looked into using the UDF, it really isn't practical for just a quick one time query. It requires mysql source code to be available and the UDF to be compiled. It would be nice if there was an easy package file to install via apt-get, but I haven't found one. Great answer regardless, and something I might use in the future!
Tauren