views:

3713

answers:

22

I'm wondering how far people should take the validation of e-mail address. My field is primarily web-development, but this applies anywhere.

I've seen a few approaches:

  • simply checking if there is an "@" present, which is dead simply but of course not that reliable.
  • a more complex regex test for standard e-mail formats
  • a full regex against RFC 2822 - the problem with this is that often an e-mail address might be valid but it is probably not what the user meant
  • DNS validation
  • SMTP validation

As many people might know (but many don't), e-mail addresses can have a lot of strange variation that most people don't usually consider (see RFC 2822 3.4.1), but you have to think about the goals of your validation: are you simply trying to ensure that an e-mail address can be sent to an address, or that it is what the user probably meant to put in (which is unlikely in a lot of the more obscure cases of otherwise 'valid' addresses).

An option I've considered is simply giving a warning with a more esoteric address but still allowing the request to go through, but this does add more complexity to a form and most users are likely to be confused.

While DNS validation / SMTP validation seem like no-brainers, I foresee problems where the DNS server/SMTP server is temporarily down and a user is unable to register somewhere, or the user's SMTP server doesn't support the required features.

How might some experienced developers out here handle this? Are there any other approaches than the ones I've listed?

Edit: I completely forgot the most obvious of all, sending a confirmation e-mail! Thanks to answerers for pointing that one out. Yes, this one is pretty foolproof, but it does require extra hassle on the part of everyone involved. The user has to fetch some e-mail, and the developer needs to remember user data before they're even confirmed as valid.

+15  A: 

There is no 100% reliable way of confirming a valid email address other than sending an email to user and and waiting for a response, like most forums.

I would go with the simple "@" validation rule and then email the user to confirm their email address.

Although, this is my personal opinion... I await other suggestions.

GateKiller
+1  A: 

I think it depends on what context you're using the email for. More serious projects require stricter validation but I think for most things sending an email to the provided address with a conformation link will ensure the email address is valid.

DShook
+10  A: 

You're best off just checking for simple things like @ and . in JavaScript, and then actually send them a verification to their email. If they verify their account, you have yourself a valid email address. That way you know for sure you have a working address, and you don't have to be too bossy in the form.

superjoe30
+19  A: 

One suggestion: don't reject addresses with a + in them. It's annoyingly common to reject them, but it's a valid character, and gmail users can use [email protected] to label and sort incoming mail more easily.

Peter Burns
+1 !!! It seems impossible to filter emails from Facebook, of all sites!
jnylen
Can filter without that - just user the sender info
Casebash
+1  A: 

Depends on the goal. If you are an ISP and you need to validate that users are creating valid email addresses, go for the Regex that validates against everything possible. If you just want to catch user errors, how about the following pattern:

[All Characters, no spaces] @ [letters and numbers] (.[letters and numbers]) where the final group appears at least one time.

The RegEx for this would appear something like this:

[\S]+@[\w]+(.[w])+

And then send a confirmation email to be sure.

Yaakov Ellis
There is a typo (unless you only want to allow "w" as the TLD) and it don't match domains with a hyphen (-). This works better: [\S]+@[\w-]+(.[\w-]+)+
some
+2  A: 

RegexBuddy offers the following email-related regular expressions from its library:

Email address (basic)

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Email address (RFC 2822, simplified)

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

But I tend to agree with Peter and SuperJoe's responses; the only real "testing" is actually sending a validation email.

Jeff Atwood
+5  A: 

On consideration from the answers (since I completely forgot about confirmation e-mails) it seems to me like a suitable compromise for a low-friction solution would be to:

  1. Use regex to check that the e-mail address looks valid, and give a warning if it is more obscure but avoid rejecting outright.
  2. Use SMTP validation to ensure that the e-mail address is valid.
  3. If SMTP validation fails then -- and only then -- use a confirmation e-mail as a last resort. Confirmation e-mails seem to require too much interaction outside of your application for them to be considered low friction, but they are a perfect fallback.
Mike Tomasello
How can you check that a user owns the email address without sending confirmation? For example, lets say they typed in your email address. Wouldn't you be annoyed that the developer decided not to provide a confirmation email and therefore allowed anybody who knows your address to sign you up to their site?
Rupert
+1  A: 

@Mike - I think that part of the reason why confirmation emails are sent is not only to ensure that the email address is valid, but that it is accessible by the user who submitted it. A person could easily put in a one-letter typo in the email address that would lead to a different, valid email address, but that would still be an error as it would be the wrong address.

Yaakov Ellis
A: 

@Yaakov (could reply do with some sort of 'replying' here)

I think that part of the reason why confirmation emails are sent is not only to ensure that the email address is valid, but that it is accessible by the user who submitted it. A person could easily put in a one-letter typo in the email address that would lead to a different, valid email address, but that would still be an error as it would be the wrong address.

I concur, but I'm not sure it's worth it. We also have confirmation fields for that purpose (repeating your e-mail address again). Another situation where the type of site might warrant different approaches.

Additionally, sending a confirmation e-mail itself gives no way of indication to the original user that the address they entered was wrong. After not receiving the confirm e-mail they may just assume your app/site is faulty; at least by allowing the user to immediately start using their account they could correct their e-mail address, particularly if it is displayed in a suitably obvious place.

Mike Tomasello
+9  A: 

Another weakness of using a regex for email validation is that it's almost impossible to catch all the valid top-level domains while rejecting all the invalid ones.

For instance, the basic email regex in Jeff Atwood's reply:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b

will accept any TLD of two to four characters. So, for example, .spam will be accepted, but .museum and .travel (both valid TLDs) will be rejected.

Just one more reason it's better to just look for the @, and send a confirmation email.

Bruce Alderman
+18  A: 

In your post it seems that when you say "SMTP validation" you mean connecting to the server and trying a RCPT TO to see if it's accepted. Since you differentiate it from actually sending a confirmation email, I assume you want to do it inline with the user actions. In addition to problems like network issues, DNS failures, etc, gray listing can wreak havoc with this method. Method's vary, but essentially gray listing always defers the first attempt to deliver to recipient per connecting IP. As I said, this can vary, some hosts might reject invalid addresses at first attempt and only defer valid addresses, but there's no reliable way to sort out the different implementations programmatically.

The only way you'll ever be sure that an address is valid and is submitted by its owner who really does want it used for your application is to send a verification email. Well, as long as it doesn't get spam filtered I guess =).

jj33
A: 

All the regex validation in the world won't prevent someone from entering an incorrect or fake email address. It's just annoying, really.

Kevin
+1  A: 

The most complete and accurate regex I've ever encountered for email validation is the one documented here. It is not for the faint of heart; it's complicated enough that it's broken into parts to make it easier for humans to parse (sample code is in Java). But in cases where going all the way with validation is merited, I don't think it gets much better.

In any case, I would suggest that you use unit testing to confirm that your expression covers the cases that you feel are important. That way, as you dink around with it, you can be sure that you haven't broken some case that worked before.

Robert J. Walker
+1  A: 

I've worked at 4 different companies where someone at the help desk got yelled at by someone named O'Malley or O'Brien or some other e-mail address with an apostrophe. As suggested previously, not all regex's will catch everything, but save yourself some hassle and accept an apostrophe without generating a warning.

--
bmb

bmb
Amen to that. The same goes for the hash sign (#), by the way.
Tomalak
Just because peoples' names don't contain hash marks doesn't mean that they're not valid in email addresses.
Dave Sherohman
+1  A: 

Whatever you choose, I think you need to err on the side of believing that 99% of the time, the user does actually know what their email address is. As someone from Australia, I still find very occasionally an oh-so-clever email validation that tells me that I can't possibly have a .com.au domain. It used to happen a lot more in the early days of the internet mind you.

Sending a confirmation email these days is acceptable to users, and is also useful in terms of opt-in as well as validating their supplied address.

warren_s
A: 

Horses for courses.

All those are valid, complete email verification systems in and of themselves, and for a given website one will be more appropriate (or as good as is warranted) than the others. In many cases several steps of verification may be useful.

If you're developing a website for a bank, you're going to want snail mail or phone verification on top of all these.

If you're developing a website for a contest you might not want any of them - verify the emails in post processing and if one fails it's too bad for the person who entered it - you might value server performance given a huge crush of people (TV contest, for instance) over making sure that everyone gets validated correctly inline.

How far should one take email verification?

As far as necessary and warranted.

And no further (KISS)

Adam Davis
A: 

I've seen sites that also guard against people using temporary throwaway spam bucket sites like Mailinator or MyTrashMail, which get around the confirmation e-mail thing. I'm not saying you should be filtering those out, I'm just saying.

jodonnell
In what way does it "get around" the e-mail confirmation? Both Mailinator and MyTrashMail will accept subsequent mails to the same address. If the user doesn't bother to check them, that's another story.
bzlm
A: 

On some sites developed at places I have worked at, we have always used confirmation emails. However, it was surprisingly common for the users to mistype their email address in ways that could not possibly have worked, and then keep waiting for the confirmation email which would not come. Adding ad-hoc code (or, for the domain name part, DNS verification) to warn the user in these cases could be a good idea.

The common cases I have seen:

  • Dropping a letter on the middle of the domain name, or several other simple typo variants.
  • TLD confusion (for instance, adding a .br to a .com domain, or dropping the .br from a .com.br domain).
  • Adding a www. at the beginning of the local part of an email address (I am not making this up; I saw several email addresses of the form [email protected]).

There were even more bizarre cases; things like a complete domain name as the local part, addresses with two @ (something like [email protected]@example.com), and so on.

Of course, most of them were still valid RFC-822 addresses, so technically you could just let the MTA deal with them. However, warning the user that the email address entered is quite possibly bogus can be helpful, especially if your target audience is not very computer literate.

CesarB
Sounds like the problem with those sites was that users were forced to enter information they really didn't understand. Not all sites require e-mail contact with its users, even if it would make things easier for the site.
bzlm
+5  A: 

With international domain names almost everything is possible:

  • Håkan.Söderström@malmö.se
  • [email protected]
  • 试@例子.测试.مثال.آزمایشی

If you want to do any tests you should first convert it to punycode.

Without punycode all you should do is to test that there:

  • is at least one @
  • is at least one character in the local part
  • is at least one dot in the domain part
  • is at least four characters in the domain (assuming that no-one has an address at the tld, that the tld is at least 2 chars)

Here is the code:

function isEmail(address) {
    var pos = address.lastIndexOf("@");
    return pos > 0 && (address.lastIndexOf(".") > pos) && (address.length - pos > 4);
}
some
+3  A: 

Use an open-source validator which doesn't give false negatives. Zero effort for you and robust validation for your app.

I've now collated test cases from Cal Henderson, Dave Child, Phil Haack, Doug Lovell and RFC 3696. 158 test addresses in all.

I ran all these tests against all the validators I could find. The comparison is here: http://www.dominicsayers.com/isemail

I'll try to keep this page up-to-date as people enhance their validators. Thanks to Cal, Dave and Phil for their help and co-operation in compiling these tests and constructive criticism of my own validator.

People should be aware of the errata against RFC 3696 in particular. Three of the canonical examples are in fact invalid addresses. And the maximum length of an address is 254 or 256 characters, not 320.

Dominic Sayers
A: 

What are you trying to catch in your email validation?

Regex validation of email addresses can, at best, verify that the address is syntactically correct and relatively plausible. It also has the hazard (as already mentioned many times) of possibly rejecting actual, deliverable addresses if the regex isn't quite correct.

SMTP verification can determine that the address is deliverable, subject to the limitations imposed by greylisting or servers which are configured to give out as little information as possible about their users. You have no way of knowing whether the MTA has only claimed to accept mail for a bogus address, then just dropped it on the floor as part of an anti-spam strategy.

Sending a confirmation message, though, is the only way to verify that an address belongs to the user who entered it. If I'm filling out your form, I can quite easily tell you that my email address is [email protected]. A regex will tell you it's syntactically valid, an SMTP RCPT TO will tell you it's a deliverable address, but it sure as hell ain't my address.

Dave Sherohman
+1  A: 

You could take email validation still further to actually test if a mailbox exists. This technique has its drawbacks (development time and also possibility of getting blacklisted for misuse). http://www.webdigi.co.uk/blog/2009/how-to-check-if-an-email-address-exists-without-sending-an-email/

Webber