tags:

views:

118

answers:

5

In PHP, I use this regex for checking mails:

$rexMail = "/^[a-z0-9\._]+@{1}[a-z0-9-_]+\.{1}[a-z]{2,4}\.?[a-z]{0,2}$/i";

In most cases, this will suffice. However, this mail address turns out to be valid for the regex:

[email protected]

That shouldn't be possible. While multiple points should be allowed before the @ sign, it shouldn't be possible to have multiple ones right after eachother.

I'm not that good with regex and I don't know how to solve this.

Also, I'm not too sure about the amount of dots after the @ sign, for there are such mail addresses as .co.uk, or worst.

+1  A: 

Here:

/^([a-z0-9_]\.?)*[a-z0-9_]+@([a-z0-9-_]+\.)+[a-z]{2,3}$/i

The part of the email address after the @ is potentially any valid domain, so you need to allow any number of .-separated items above the TLD - for instance, email.staff.mycompany.com is a valid host. Also, a top-level domain can have more than 2 characters in it; many of the common ones are 3 (.com, .net, etc).

However, VoteyDisciple's comment below is valid - technically, an email address can have .. in it.

Amber
Now even less stuff matches
WebDevHobo
Try copying the regex again, I had edited it to tweak a slight issue.
Amber
There we have it, working
WebDevHobo
So in other words, my original should be good enough?
WebDevHobo
Your original would be okay, yes, though as noted the `{1}` is unnecessary.
Amber
+3  A: 

First, [email protected] is a perfectly valid e-mail address. There's nothing that says an e-mail address cannot have multiple consecutive period characters before the @ sign. In fact, one can have just about anything before the @ sign, including some characters you do not allow (e.g., +). So, you will be rejecting a variety of perfectly valid addresses as you've written it now.

See http://www.regular-expressions.info/email.html for a straightforward expression that will do the trick, along with some explanation of why e-mail address validation usually goes astray when done with regular expressions.

VoteyDisciple
+1  A: 

The problem is how character classes work: A character class, like [a-z0-9._]+, means, "one of these-- with this rule applied one or more times". Basically, there's nothing in the character class' rules such that you can't repeat characters.

The trick is that you want to separate "words" with periods, and you have to take that grouping into account.

Consider something like this:

$rexMail = "/^[a-z0-9_]+(\.[a-z0-9_]+)*@{1}[a-z0-9-_]+\.{1}[a-z]{2,4}\.?[a-z]{0,2}$/i";

This basically says, "one word (composed of alphanumeric or underscore), then OPTIONALLY (zero or more times) a dot, followed by another word." (And that can be repeated as many or few times as you like)

Incidentally, the {1} quantifier is never necessary, because by default, if you don't specify a quantifier the regex engine searches for it exactly once.

Platinum Azure
Didn't know that about the quantifier, though the regex wont work for longer addresses after the @ sign, like mail.company.subcompany.com
WebDevHobo
Very true, it won't.I was just answering his question about the two periods, since people had already beaten me to the validity of the regex itself.Sorry for the confusion. OP, make sure you take that into account if you use the regex I supplied.
Platinum Azure
Oh, and I'd also wanted to highlight the point about needing to group the "words" of the e-mail address, though as we know that's unnecessary since .. is allowed before the @.But understanding when it might be necessary to think in terms of potential groupings (and later, captures) is a fundamental skill in working with regex, and I want to help the OP discover patterns that allow for/require groupings.Along those lines, one more thing that's important is to work out your boundary conditions. I placed the period early in the group because I HAD to, and I leave it to the OP to reason why.
Platinum Azure
A: 

You might find http://www.regular-expressions.info/email.html to be useful. It is almost impossible to implement valid email address checking using just regular expressions. However, this page touches on several alternatives.

I tend to use the one near the bottom of the page. I have found it suitable for most web application usage, and modified it slightly to work with PHP escaping:

//courtesy of http://www.regular-expressions.info/email.html, modified to be escaped for PHP string and preg_match
$exp = '[a-z0-9!#$%&\'*+\\/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&\'*+\\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?';
if (! preg_match('/^'.$exp.'$/i', $email)) return false;
zombat
A: 

You want to make sure the regular expression is as lenient as possible. As it is there are many valid addresses it will not match.

Here is a PHP library for the job of email syntax validation: http://code.google.com/p/php-email-address-validation/

It is pretty hard to actually validate emails without sending an email to them. But SMTP validation works in most cases, though fails should also be treated as a possible fail, not an actual fail.

Here is a class that will validate emails via SMTP: http://code.google.com/p/php-smtp-email-validation/

bucabay