views:

1112

answers:

13

I have a table which is full of arbitrarily formatted phone numbers, like this

027 123 5644
021 393-5593
(07) 123 456
042123456

I need to search for a phone number in a similarly arbitrary format ( e.g. 07123456 should find the entry (07) 123 456

The way I'd do this in a normal programming language is to strip all the non-digit characters out of the 'needle', then go through each number in the haystack, strip all non-digit characters out of it, then compare against the needle, eg (in ruby)

digits_only = lambda{ |n| n.gsub /[^\d]/, '' }

needle = digits_only[input_phone_number]
haystack.map(&digits_only).include?(needle)

The catch is, I need to do this in MySQL. It has a host of string functions, none of which really seem to do what I want.

Currently I can think of 2 'solutions'

  • Hack together a franken-query of CONCAT and SUBSTR
  • Insert a % between every character of the needle ( so it's like this: %0%7%1%2%3%4%5%6% )

However, neither of these seem like particularly elegant solutions.
Hopefully someone can help or I might be forced to use the %%%%%% solution

Update: This is operating over a relatively fixed set of data, with maybe a few hundred rows. I just didn't want to do something ridiculously bad that future programmers would cry over.

If the dataset grows I'll take the 'phoneStripped' approach. Thanks for all the feedback!

+2  A: 

An out-of-the-box idea, but could you use a "replace" function to strip out any instances of "(", "-" and " ", and then use an "isnumeric" function to test whether the resulting string is a number?

Then you could do the same to the phone number string you're searching for and compare them as integers.

Of course, this won't work for numbers like 1800-MATT-ROCKS. :)

Matt Hamilton
A: 

could you use a "replace" function to strip out any instances of "(", "-" and " ",

I'm not concerned about the result being numeric. The main characters I need to consider are +, -, (, ) and space So would that solution look like this?

SELECT * FROM people 
WHERE 
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phonenumber, '('),')'),'-'),' '),'+')
LIKE '123456'

Wouldn't that be terribly slow?

Orion Edwards
+1  A: 

This is a problem with Mysql - the regex function only match but cant replace. See this post for a possible solution.

crono
A: 

Wouldn't that be terribly slow?

You didn't say you needed it to be fast! :)

Matt Hamilton
A: 

Is it possible to run a query to reformat the data to match a desired format and then just run a simple query? That way even if the initial reformatting is slow you it doesn't really matter.

megabytephreak
A: 

MySQL can search based on regular expressions.

Sure, but given the arbitrary formatting, if my haystack contained "(027) 123 456" (bear in mind position of spaces can change, it could just as easily be 027 12 3456 and I wanted to match it with 027123456, would my regex therefore need to be this?

"^[\D]+0[\D]+2[\D]+7[\D]+1[\D]+2[\D]+3[\D]+4[\D]+5[\D]+6$"

(actually it'd be worse as the mysql manual doesn't seem to indicate it supports \D)

If that is the case, isn't it more or less the same as my %%%%% idea?

Orion Edwards
A: 

Just an idea, but couldn't you use Regex to quickly strip out the characters and then compare against that like @Matt Hamilton suggested?

Maybe even set up a view (not sure of mysql on views) that would hold all phone numbers stripped by regex to a plain phone number?

crucible
A: 

Woe is me. I ended up doing this:

mre = mobile_number && ('%' + mobile_number.gsub(/\D/, '').scan(/./m).join('%'))

find(:first, :conditions => ['trim(mobile_phone) like ?', mre])
Orion Edwards
A: 

if this is something that is going to happen on a regular basis perhaps modifying the data to be all one format and then setup the search form to strip out any non-alphanumeric (if you allow numbers like 310-BELL) would be a good idea. Having data in an easily searched format is half the battle.

Tanj
+4  A: 

This looks like a problem from the start. Any kind of searching you do will require a table scan and we all know that's bad.

How about adding a column with a hash of the current phone numbers after stripping out all formatting characters. Then you can at least index the hash values and avoid a full blown table scan.

Or is the amount of data small and not expected to grow much? Then maybe just sucking all the numbers into the client and running a search there.

John Dyer
+1  A: 

My solution would be something along the lines of what John Dyer said. I'd add a second column (e.g. phoneStripped) that gets stripped on insert and update. Index this column and search on it (after stripping your search term, of course).

You could also add a trigger to automatically update the column, although I've not worked with triggers. But like you said, it's really difficult to write the MySQL code to strip the strings, so it's probably easier to just do it in your client code.

(I know this is late, but I just started looking around here :)

Michael Johnson
A: 

a possible solution can be found at http: //udf-regexp.php-baustelle.de/trac/

additional package need to be installed, then you can play with REGEXP_REPLACE

A: 

See

http://www.nogid.org/find-phone-number-in-database-format-independent

It is not really an issue that the regular expression would become visually appalling, since only mysql "sees" it. Note that instead of '+' (cfr. post with [\D] from the OP) you should use '*' in the regular expression.

Some users are concerned about performance (non-indexed search), but in a table with 100000 customers, this query, when issued from a user interface returns immediately, without noticeable delay.

Grbts