views:

146

answers:

6

Hello stackers!

I've got a wonderfully fun little SQL problem to solve today and thought I'd ask the community to see what solutions you come up with.

We've got a really cool email to text service that we use, you just need to send an email to [email protected] and it will send a text message to the desired phone number.

For example to send a text to 0790 0006006, you need to send an email to [email protected], pretty neat huh?

The problem is with the phone numbers in our database. Most of the phone numbers are fine, but some of them have "rubbish" mixed in with the phone number.

Take these wonderful examples of the rubbish you need to deal with (I've anonymised the phone numbers by placing zeroes in):

07800 000647(mobile)
07500 000189 USE 1ST
SEE NOTES
07900 000415 HO ONLY
try 1st 0770 0000694 then home
07500 000465 Cannot

Requirements

The solution needs to be in SQL (for MS SQL server).

So the challenge is as follows, we need to get the phone number without spaces, and without any of the rubbish seen in the samples.

For example:

This:

try 1st 0770 0000694 then home

Should become this:

07700000694

Anything without a phone number in it (e.g. "SEE NOTES") should be null.

UPDATE:

Thanks for the great responses! We've had some interesting answers, but seeing as none of the SQL answers have had any votes it's a bit hard to pick a favourite. I'd have rather seen a clear favourite picked my the community.

I'll let the question mature a little more and see if any votes come in before I award an answer.

A: 

Based on your samples, it looks like for the most part you just need to remove any non-numeric characters and spaces from the string (I don't recall the SQL function for this, but it's trivial). The only exception is with things like "1st" or "2nd", and you could get rid of these before stripping out non-numeric characters with a bunch of REPLACE('1ST','') -type statements.

There may be a lot more odd situations in your data where people include actual digits that aren't really part of their phone number. I'm not sure how you find and fix all of these, other than by just dealing with them whenever you happen to spot them.

There are doubtless many third-party components that handle phone number parsing, but I don't know if any can be used directly from SQL Server. Probably some of them can, depending on your version of SQL Server. A Google search on "parse phone numbers in SQL Server" gives a bunch of options.

MusiGenesis
A: 

DECLARE @test varchar(100)
DECLARE @result varchar(100)
SET @test='07800 000647(mobile)'

SET @result=''
SELECT
@result=@result+CASE WHEN number LIKE '[0-9]' THEN number ELSE '' END FROM
(
SELECT SUBSTRING(@test,number,1) AS number FROM
(
SELECT NUMBER FROM Master..spt_values WHERE type='p' AND number between 1 and len(@test)
) AS temp
) AS temp
SELECT @result

As MusicGenesis says though, you have to deal with anything like 1st and 2nd separately.

ho1
A: 

Looks like you could step along looking for long contiguous strings of numbers: (quick & dirty)

CREATE FUNCTION fnRipMsisdn(@STRING VARCHAR(28)) RETURNS VARCHAR(28) AS
BEGIN
DECLARE @I INT, @RESULT VARCHAR(28), @CHAR CHAR, @CONCURRENT_ALPHA INT
SET @I = 0
SET @RESULT = ''
SET @CONCURRENT_ALPHA = 0
SET @STRING = REPLACE(@STRING, ' ', '') --replace chars that can delimit an msisdn

WHILE @I < LEN(@STRING) BEGIN
    IF LEN(@RESULT) >= 13 --MAX LEN
        BREAK
    SET @I = @I + 1
    SET @CHAR = SUBSTRING(@STRING, @I, 1)
    IF @CHAR LIKE '[0-9]' AND @CONCURRENT_ALPHA < 1 BEGIN
        SET @CONCURRENT_ALPHA = 0
        SET @RESULT = @RESULT + @CHAR
    END ELSE BEGIN
        SET @CONCURRENT_ALPHA = @CONCURRENT_ALPHA + 1
        IF LEN(@RESULT) <= 9 BEGIN --MIN LEN
            SET @RESULT = ''
        END
    END
END
RETURN CASE WHEN @RESULT = '' THEN NULL ELSE @RESULT END
END

select dbo.fnRipMsisdn('07800 000647(mobile)')
select dbo.fnRipMsisdn('07500 000189 USE 1ST')
select dbo.fnRipMsisdn('SEE NOTES')
select dbo.fnRipMsisdn('07900 000415 HO ONLY')
select dbo.fnRipMsisdn('try 1st 0770 0000694 then home')
select dbo.fnRipMsisdn('07500 000465 Cannot')

07800000647
07500000189
NULL
07900000415
07700000694
07500000465
Alex K.
A: 

The solution I have come up with so far is as follows:

SELECT 
CASE WHEN ISNUMERIC(SUBSTRING(REPLACE(MobilePhone, ' ', ''), 1, 11)) = 1 
THEN SUBSTRING(REPLACE(MobilePhone, ' ', ''), 1, 11) + '@emailservice.com' 
ELSE NULL END AS EmailToTextAddress
FROM Contacts

However, this won't deal with rubbish at the start of the phone number.

It also assumes that a phone number (without spaces) is 11 characters long, which allows me to deal with numeric charecters that aren't part of the phone number (as in MusiGenesis's answer).

DoctaJonez
Be careful with ISNumeric, it will also return 1 for valid currency symbols. The list of those symbols is http://msdn.microsoft.com/en-us/library/ms188688.aspx
Jon
Thanks Jon, that's very interesting.
DoctaJonez
You might be able to get around the problem with currency symbols by adding .0e0 to the telephone number before you do ISNUMERIC.
ho1
It also thinks a '+', tab (char(9)) and new line are numeric.
Alex K.
@Alex K., Thanks, I forgot about those, also minus (-)
Jon
+1  A: 

Your best bet is to fix the data. If you can't fix the data, then put in a new calculated field that strips out the characters you don't want. In any event start now to put controls on data entry inthat field in your applciation. You don't honestly want to waste processing power do this kind of data manipulation with every query do it once whenthe dat is entered and be done with it.

HLGEM
I fully agree with you. Luckily this is a one shot query today, we won't be doing this kind of report often. This data is from a legacy system that we are migrating away from. Our data validation is much better in the new system, we won't have this problem.
DoctaJonez
+2  A: 

Assuming that yopur phones always start with '07' and the length is 12 characters you can try something like this:

DECLARE @Number varchar(50)

--SET @Number='07800 000647(mobile)'
--SET @Number='07500 000189 USE 1ST'
--SET @Number='SEE NOTES'
--SET @Number='07900 000415 HO ONLY'
--SET @Number='try 1st 0770 0000694 then home'
SET @Number='07500 000465 Cannot '



SELECT REPLACE(SUBSTRING(@Number, case when CHARINDEX ('07',@Number ) =0 then Null 
else CHARINDEX ('07',@Number )end , 12),' ','')

First of all, finding the starting point of the '07' string, then, if it is 0 ('SEE NOTES'), return Null. After that, getting the 12 characters of the number. Lastly, replacing the spaces...

Claudia
I like this solution, although you're assuming that the number will always have spaces in. I suppose I didn't put an exaple without spaces in my sample data, so my bad! ;-) It can easily be rectified by doing the replace first and assuming a length of 11 instead.
DoctaJonez