views:

28

answers:

4

Hi, I have this two tables:

data    
id   |email    
_   
1    |[email protected]    
2    |[email protected]    
3    |zzzgimail.com 

errors    
_    
error    |correct    
@gmial.com|@gmail.com    
gimail.com|@gmail.com    

How can I select from data all the records with an email error? Thanks.

A: 

Well, in reality you can't with the info you have provided.

In SQL you would need to maintain a table of "correct" domains. With that you could do a simple query to find non-matches.

You could use some "non" SQL functionality in SQL Server to do a regular expression check, however that kind of logic does not below in SQL (IMO).

Dustin Laine
+1  A: 

Assuming the error is always at the end of the string:

declare @data table (
    id int,
    email varchar(100)
)

insert into @data
    (id, email)
    select 1, '[email protected]' union all
    select 2, '[email protected]' union all
    select 3, 'zzzgimail.com'

declare @errors table (
    error varchar(100),
    correct varchar(100)
)

insert into @errors
    (error, correct)
    select '@gmial.com', '@gmail.com' union all
    select 'gimail.com', '@gmail.com'   

select d.id, 
       d.email, 
       isnull(replace(d.email, e.error, e.correct), d.email) as CorrectedEmail
    from @data d
        left join @errors e
            on right(d.email, LEN(e.error)) = e.error
Joe Stefanelli
thank you joe, this works perfectly, many thanks.
eiefai
sorry joe, I had to change my accepted response, but I gave you an up vote, hope you don't mind
eiefai
@eiefai: No problem at all.
Joe Stefanelli
+1  A: 
SELECT d.id, d.email
FROM data d
    INNER JOIN errors e ON d.email LIKE '%' + e.error

Would do it, however doing a LIKE with a wildcard at the start of the value being matched on will prevent an index from being used so you may see poor performance.

An optimal approach would be to define a computed column on the data table, that is the REVERSE of the email field and index it. This would turn the above query into a LIKE condition with the wildcard at the end like so:

SELECT d.id, d.email
FROM data d
    INNER JOIN errors e ON d.emailreversed LIKE REVERSE(e.error) + '%'

In this case, performance would be better as it would allow an index to be used.

I blogged a full write up on this approach a while ago here.

AdaTheDev
thanks adathedev, this works better.
eiefai
A: 
select * from 
(select 1 as id, '[email protected]' as email union
 select 2 as id, '[email protected]' as email union
 select 3 as id, 'zzzgimail.com' as email) data join

(select '@gmial.com' as error, '@gmail.com' as correct union
 select 'gimail.com' as error, '@gmail.com' as correct ) errors

 on data.email like '%' + error + '%' 

I think ... that if you didn't use a wildcard at the beginning but anywhere after, it could benefit from an index. If you used a full text search, it could benefit too.

Dr. Zim