views:

2922

answers:

3

Ok, there are a million regexes out there for validating an email address, but how about some basic email validation that can be integrated into a TSQL query for Sql Server 2005?

I don't want to use a CLR procedure or function. Just straight TSQL.

Has anybody tackled this already?

+3  A: 

Very basic would be:

SELECT
  EmailAddress, 
  CASE WHEN EmailAddress LIKE '%_@_%_.__%' 
            AND EmailAddress NOT LIKE '%[any obviously invalid characters]%' 
  THEN 'Could be' 
  ELSE 'Nope' 
  END Validates
FROM 
  Table

This matches everything with an @ in the middle, preceded by at least one character, followed by at least two, a dot and at least two for the TLD.

You can write more LIKE patterns that do more specific things, but you will never be able to match everything that could be an e-mail address while not letting slip through things that are not. Even with regular expressions you have a hard time doing it right. Additionally, even matching according to the very letters of the RFC matches address constructs that will not be accepted/used by most emailing systems.

Doing this on the database level is maybe the wrong approach anyway, so a basic sanity check as indicated above may be the best you can get performance-wise, and doing it in an application will provide you with far greater flexibility.

Tomalak
Yeah, I already have regexes in code doing this for me, but I need to do reporting on tables with zillions of emails and come up with aggregates.
Eric Z Beard
Knowing the base data you have, you might be able to come up with something more specific and appropriate than what I suggested for a start, but you will never get it "correct" as the word is used in algorithm theory.
Tomalak
I see you added "not like '%[any obviously invalid characters]%". From what I've learned about the spec, I'm not sure there are actually any keys on the keyboard that couldn't technically be construed as valid somewhere in the address.
Eric Z Beard
That depends on your keyboard. On mine, there are 'ö', 'ä', 'ü' and 'ß', these are invalid in any case. More generally, angle brackets would for example not be allowed, and the list goes on.
Tomalak
+1  A: 

Here's a sample function for this that's a little more detailed, I don't remember where I got this from (years ago), or if I modified it, otherwise I would include proper attribution:

CREATE FUNCTION [dbo].[fnAppEmailCheck](@email VARCHAR(255))   
--Returns true if the string is a valid email address.  
RETURNS bit  
as  
BEGIN  
     DECLARE @valid bit  
     IF @email IS NOT NULL   
          SET @email = LOWER(@email)  
          SET @valid = 0  
          IF @email like '[a-z,0-9,_,-]%@[a-z,0-9,_,-]%.[a-z][a-z]%'  
             AND LEN(@email) = LEN(dbo.fnAppStripNonEmail(@email))  
             AND @email NOT like '%@%@%'  
             AND CHARINDEX('.@',@email) = 0  
             AND CHARINDEX('..',@email) = 0  
             AND CHARINDEX(',',@email) = 0  
             AND RIGHT(@email,1) between 'a' AND 'z'  
               SET @valid=1  
     RETURN @valid  
END
mikeh
A: 

From Tomalak's slelect

select 1
where @email not like '%[^a-z,0-9,@,.]%'
and @email like '%_@_%_.__%'
payonk