views:

474

answers:

5

Greetings! I have a Spring app and a form getting validated on the back and front ends. On the back end I'm using annotation based validation for the EMAIL field with help from org.springmodules.validation. So far so good.

On the front end I decided to use the jQuery Form Validation plugin and discovered that front and back validation are out of sync with one another.

For instance: [email protected] with pass the jQuery validation but not the Spring one. I looked at both re-gex'es and my eyes crossed.

Would anyone be kind enough to comment on the trade-offs, advantages/disadvantages of using either one?

Here they are:

jQuery Validation Plugin regex: (original source here)

^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6}$

org.springmodules.validation regex:

^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$

Thanks a bunch in advance!

+1  A: 

Matching emails via regex is a difficult task. And the regex of jQuery validation is way(!) to simple. It is so simple it will even fail for many email-addresses which are actually valid. Once again a case of "Hey there is only ASCII outthere as valid characters just forget about the rest of the world"

e.g. a valid e-mail address with a german umlaut character

[email protected]

My advice is to just validate the email with the spring-backend and skip that step on the client side. Afterwards you have to send a test-email anyway to check if the email is really valid, active, ...

jitter
+1  A: 

Not to mention that Chinese/Arabic domain names are to be allowed in the near future. Everyone has to change the email regex used, because those characters are surely not to be covered by [a-z]/i nor \w. They will all fail.

After all, the best way to validate the email address is still to actually send an email to the address in question to validate the address. If the email address is part of user authentication (register/login/etc), then you can perfectly combine it with the user activation system. I.e. send an email with a link with an unique activation key to the specified email address and only allow login when the user has activated the newly created account using the link in the email.

If the purpose of the regex is just to quickly inform the user in the UI that the specified email address doesn't look like in the right format, best is still to check if it matches basically the following regex:

^([^.@]+)(\.[^.@]+)*@([^.@]+\.)+([^.@]+)$

Simple as that. Why on earth would you care about the characters used in the name and domain? It's the client's responsibility to enter a valid email address, not the server's.

BalusC
This falsely matches invalidaddress@gmailcom andinvalidaddressgmail.com
Steve Wortham
No, Steve, it doesn't. Maybe you didn't interpreted it good or you actually didn't test it all.
BalusC
Ah, I tested it in multiline mode with Javascript as well as the .NET regex engine. However, if you test just a single email address it does work as expected.
Steve Wortham
By the way, I was the one who upvoted you because I liked your answer -- I only thought I saw a hole in your regular expression.
Steve Wortham
It falsely fails on `"foo @ bar"@example.com`, which is a perfectly valid address. You forgot about quoted atoms.
Randal Schwartz
Valid point. However it is a very rare case to have an `@` in the name part. You're free to improve it :)
BalusC
A: 

Why not take advantage of already implemented solution: http://commons.apache.org/validator/apidocs/org/apache/commons/validator/routines/EmailValidator.html

egaga
A: 

There are a million possible solutions, however I like:

^[^<>\s\@]+(\@[^<>\s\@]+(\.[^<>\s\@]+)+)$

In essense here's what it's doing:

  1. Match all characters other than left & right chevrons, whitespace, and @
  2. Match an @ symbol
  3. Repeat #1
  4. Match a dot (.)
  5. Repeat #1

This means that it'll allow foreign characters when the Chinese/Arabic domain names are to be allowed as BalusC mentioned. But it'll catch the gross errors like a missing @ symbol, no dot for the domain name, a space, etc. Furthermore, it'll behave the same way in Javascript as it does in any other Perl-based regular expression language I know of. So it's a good candidate for both client-side and server-side validation.

I have created test cases for this here:

http://regexhero.net/tester/?id=4a1f18cf-3dc0-4157-ab74-489a69e184ee

I'm sure you can enter some invalid addresses that this regular expression will match. But for the purposes of web validation I don't care as much about that. I mean, a regular expression is never going to be a complete replacement for real email verification anyway.

So personally my biggest concern is to trap the most common mistakes while allowing ALL potentially valid email addresses. If someone can find a valid email address that this regex won't match, I'd like to know about it. Until then, this is what I'm going to use. ;)

Steve Wortham
This falsely matches `[email protected]` and `[email protected].` (note the leading/trailing dot).
BalusC
"I mean, a regular expression is never going to be a complete replacement for real email verification anyway."And yet everyone uses it for that. {sigh}
Randal Schwartz
A: 

Yet another regexp:

/^[a-z0-9\._-]+@[a-z0-9\._-]+\.(xn--)?[a-z0-9]{2,}$/i

It supports IDN (internationalized domain names) if written in ASCII (e.g. xn--lvaro-wqa.es). Otherwise you'd need to implement Punycode encoding.

I love it because it's very simple. It only seeks for obvious typing mistakes. As already mentioned, it's the best approach. The more complicate your expression gets, the more valid e-mails you'll reject. Real e-mail validation can only be accomplished be actually sending an e-mail message.

Álvaro G. Vicario