views:

93

answers:

1

My application wants to store a list of international phone number in a mysql database. Then the application would need to query the database and search for a specific number. Sounds simple but it's actually a huge problem.

Because users can search for that number in a different format we'll have to do a full scan to the database each time.

For example. We might have the number 17162225555 stored in the database (along with another 5 million entries). Now the user comes along and attempt to search using 7162225555. Another user might try to serach with 2225555. etc etc. So in other words, the database have to issue the SQL query using a "like %number%" which would result in a full scan.

How should we design this application? Is there some way to tweat the Mysql to handle this better? Or should we not use SQL at all?

PS. We have millions of entries, and 10s of these search request per second.

+6  A: 

This is very weird, I've struggled with this issue myself many times, over the last 15 years and generally come up with structures that separate area codes, country codes and number into separate fields etc. But whilst reading your question another solution just popped into my head, it does require a separate field though so may not be appropriate for you.

You could have a separate field called reverse_phone_number, have this automatically populated by the DB engine then when people search simply reverse the search string and use the indexed reverse field with just a % at the end of the like string, thereby allowing the use of the index.

Dependant on your DB engine you may be able to create an index based on a user-defined function that does the reverse for you obviating the need for an additional field.

In some countries, e.g. the UK, you may have an issue with leading zeros. A UK phone number is represented as (area code)(Phone Number) e.g. 01634 511098, when this is internationalised the leading zero of the area code is removed and the international dial code (+ or 00) and the country code (44) are added. This results in an international phone number of +441634511098. Any user searching for 0163451109 would not find the phone number if it was entered in internationalised format. You can overcome this issue by removing leading zeros from the search string.

EDIT Based on suggestions from Ollie Jones you should store the number as entered by the user and then strip leading zeros, punctuation and white space from the number before reversing and storing in the reversed field. Then simply use the same algorithm to strip the search string before reversing, find the record and then display the originally entered number back to the user.

Steve Weet
Steve that reverse # field is genius.
JNK
To make sure I get what you mean, you'd have a second table storing all their queries, with a bridge table matching the phone numbers it brought up? Automatically populated for every new query, so if somebody enters an existing search, it would just look at the old results? Nifty idea.
Slokun
doh! you beat me to it by a few seconds. I think the reverse_phone_number field is a very good solution, as long as we can assume that the user will always know the last digits of the phone number they are searching for.
Peter Recore
@Slokun I don't think that's what steve is proposing.
Peter Recore
@slokun. I was proposing a second column within the same table with the content being the reverse of the telephone number as most people will query based on the end digits
Steve Weet
@Steve Skimmed a bit too quickly, misread what you wrote. I see what you mean.
Slokun
The additional field is a non issue for us as this is a new project from scratch. But can you further explain the leading zeros problem? Lost you there.
erotsppa
This is right on, and works great, Steve. But the reverse phone number field also needs to omit the punctuation, so that +1.800.555.1212 gets stored as 21215550081 . Then when searching, you also strip the punctuation. It is also a good idea to store an index field containing the stripped number in the forward direction, that is 18005551212. Then, when your user presents a phone number to find, strip it, then search for both. Works great in my shop for stuff like medical record numbers. Always display the phone number WITH the user-supplied punctuation.
Ollie Jones
@erotsppa Edited to better explain the "leading zeros" issue and to incorporate recommendations from @Ollie Jones
Steve Weet
@Ollie Jones. +1 Great suggestions answer edited to reflect them
Steve Weet