views:

932

answers:

9

I'm really not confident with Regex, I know some basic syntax but not enough to keep me happy.

I'm trying to build a regular expression to check if an email is valid. So far here's what I've got:

[A-Za-z0-9._-]+@[A-Za-z0-9]+.[A-Za-z.]+

It needs to take account of periods in the username/domain and I think it works with multiple TLDs (e.g. co.uk).

I'm working with the preg engine in PHP so it needs to work with that. Thanks if you can help!

+4  A: 

Try

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

(from http://www.regular-expressions.info/email.html)

The much longer fully-compliant parser is unnecessary-- it's okay if you have a few false positives (people will give nonexistent email addresses no matter how complex your regex is).

sysrqb
+8  A: 

Email addresses are extremely difficult to validate by regex; and even if you do, it tells you nothing about whether it's actually valid.

If you still want to do it by regex, see here: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html Note that even that monstrosity has a disclaimer about applicability.

Anonymouse
+2  A: 

Is there much point in using a Regex then? Should I just go for the faster and easier option of (strpos($str, '@'))?

Just found that PHP has in-built input filtering and am going to use that instead.

Ross
Note that PHP's "validate email" filter does not actually comply to the RFC, so you may not want to trust it if you're trying to be 100% accurate.
Chad Birch
+8  A: 

Validating email addresses is one of the good examples of where using regular expressions might not be a smart idea.

However a simple permissive approach that lets some "invalid" addresses through is often reasonable - you'll have to cope with email delivery failures anyway, so accepting the odd bogus address hardly matters.

"How to Find or Validate an Email Address" on regular-expressions.info gives a nice overview of different trade-offs between complexity and correctness and suggests several patterns that might work for your needs, like:

^[0-9A-Za-z._%+-]+@[0-9A-Za-z.-]+\.[A-Za-z]{2,6}$
gz
Only uppercase? No support of .museum (for example)? Advice is good, example is poor (and outdated).
PhiLho
Fair comment PhiLho, is a little misleading outside the context of the second link, which states "...intended to be used with your regex engine's "case insensitive" option turned on", (also is more of a scanning than a checking example). Have switched for a clearer example.
gz
[email protected] also not validating (IDN, now a valid TLD). Advice good, example too restrictive, +0 in total ;)
Piskvor
A: 

The regular-expressions.info article on the issue linked by sysrqb is a thorough exporation of the issues.

http://www.regular-expressions.info/email.html

I believe it's better to err on the side of generosity when you're not absolutly sure about what data is valid, for example, your regex excludes the prefectly valid apostrophe character, so the O'Neils of the world might find themselves shut out of your service.

Nick Higgs
+2  A: 

Indeed, given the complexity of RFC2822, one should go on the permissive side.
I think now I would only check we have an @ with something before and after, thus mostly avoiding user's distraction (leaving empty, giving a username instead of an address...).
Thus, we won't lock out (indeed rare) cases of people with addresses like
"John B. O'Reilly"@Foo.museum or e=m.c^2@[55.145.88.44]...

PhiLho
+1  A: 

You can test your RegExp here: http://gskinner.com/RegExr/

Nice little app that really helped me along the way :)

sonstabo
+1  A: 

For a very comprehensive discussion on this topic please see this link: "Comparing E-mail Address Validating Regular Expressions"

spoon16
A: 

Here's a page where you can download some free PHP email address validators: Dominic Sayers - RFC-compliant email address validator.

You can see the results of which test cases the various validators get right/wrong here: The Validators - head-to-head

Chad Birch