views:

316

answers:

5

Hi,

we a phonenumber field in our database and I would like to do a simple lookup query like:

SELECT * FROM TABLE WHERE Phonenumber = '555123456'

But since the phonenumbers are entered by users and are not normalized, we don't really know what they look like.

Could be:

  • +555-123456

or

  • (555) 123 456

or

  • 555-12-34-56

or something complety different.

The only thing certain is that the all the given numbers should be there in the correct order. Is it possible to construct a query around that?

+2  A: 

In Oracle:

SELECT  *
FROM    mytable
WHERE   REGEXP_REPLACE(Phonenumber, '[^0-9]', '') = '5551234567'

In SQL Server 2005+:

WITH    digits AS
        (
        SELECT  1 AS digit
        UNION ALL
        SELECT  digit + 1
        FROM    digits
        WHERE   digit <= 100
        )
SELECT  *
FROM    mytable
WHERE   (
        SELECT  SUBSTRING(number, digit, 1) AS [text()]
        FROM    digits
        WHERE   SUBSTRING(number, digit, 1) BETWEEN '0' AND '9'
        FOR XML PATH('')
        ) = '5551234567'

, or, if you want to see the normalized phone value,

WITH    digits AS
        (
        SELECT  1 AS digit
        UNION ALL
        SELECT  digit + 1
        FROM    digits
        WHERE   digit <= 100
        ),
        phones AS
        (
        SELECT  m.*,
                (
                SELECT  SUBSTRING(number, digit, 1) AS [text()]
                FROM    digits
                WHERE   SUBSTRING(number, digit, 1) BETWEEN '0' AND '9'
                FOR XML PATH('')
                ) AS nphone
        FROM    mytable m
        )
SELECT  *
FROM    phones
WHERE   nphone = '5551234567'

However, you better create another column for normalized phone values, fill it in a trigger and index it, so you can query more efficiently.

Quassnoi
A: 

Since I don't know what RDBMS you're looking for, I'll give the most generic way:

phonenumber like '%5%5%5%1%2%3%4%5%6%'

This assumes that all phone numbers are at least equal length (in digits).

Eric
Length should be exactly equal. This will also match something like `55555555123456`
Quassnoi
@Quassnoi: Hence I put in the clause about the phone numbers needing to be equal in digits. Without the OP's RDBMS, though, it's the best generic solution.
Eric
This could have an impact on performance as indexes are severely limited (or not used at all) with LIKE clauses.
Nate
This assumes that every phone number has the digits 555123456 in it. The question says that the users enter the phone numbers, so I doubt they are all the same.
Macho Matt
+1  A: 

You can try something like:

SELECT * FROM TABLE WHERE REPLACE( REPLACE( REPLACE( REPLACE(phone,'(','') ,')','') ,'-','') ,'+','') ,' ','') = '0398765432'

Replace any non numeric value with an empty string.

Max
+4  A: 

IF you can alter the table (assuming it's SQL Server 2005 and up), you could add a computed column to your table, and persist it. This column could hold a "cleaned up" representation of your "phonenumber" field.

Something like this:

 create function dbo.CleanPhone(@phone varchar(100))
 returns varchar(100)
 with schemabinding
 as begin
   return
     replace(replace(replace(replace(replace(replace(@phone, ' ', ''), 
             '-', ''), '(', ''), ')', ''), '-', ''), '+', '')
 end

and then:

alter table (yourtable)
 add cleanedPhone as dbo.CleanPhone(Phone) persisted

Now, your "CleanedPhone" column would always contained a "cleaned up" version of your phone number - always something like: 555123456.

Since it's a PERSISTED field, you don't incur a performance penalty when querying, either - the value is created and stored in your table, and is available as a normal column.

On this, you could now query quite easily.

Marc

marc_s
+1  A: 

ALternatively, you could fix your data entry pages to strip all nonnumeric characters before entry into the database and then fix all existing data to strip the numbers. Then put a constraint on the field that requires all characters to be numeric. Then you don't need to keep looking for nonnumeric characters every time you want to display a phone number and you can easily have the user interface consistently show them in whatever format you prefer them to be shown in. Where possible, fixing the problem (ie that you didn't put proper controls in your user interface) is better than creating workarounds which will usually create performance slowdowns for no good reason.

HLGEM