tags:

views:

122

answers:

6

Hey all

I'm trying to create a regular expressions that will filter valid emails using PHP and have ran into an issue that conflicts with what I understand of regular expressions. Here is the code that I am using.

if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+.[a-zA-Z]{2,4}$/', $string)) {
return $false;
}

Now from the materials that I've researched, this should allow content before the @ to be multiple letters, numbers, underscores and periods, then afterwards to allow multiple letters and numbers, then require a period, then two to four letters for the top level domain.

However, right now it ignores the requirement for having the top level domain section. For example [email protected] obviously is valid (and should be), but a@b is also returning as valid, which I want ti to be flagged as not so.

I'm sure I"m missing something, but after browsing google for an hour I'm at a loss as to what it could be. Anyone have an answer for this conundrum?

EDIT: The speed that answers arrive here makes this site superior over it's competitors. Well done!

+5  A: 

Rather than rolling your own, perhaps you should read the article How to Find or Validate an Email Address on Regular-Expressions.info. The article also discusses reasons why you might not want to validate an email address using a regular expression and provides 3 regular expressions that you might consider using instead of your own.

Thomas Owens
+5  A: 

You should escape . when it's not a part of the group: '/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/' Otherwise it will be equal to any letter:

  • . - any symbol (but not the newline \n if not using s modifier)
  • \. - dot symbol
  • [.] - dot symbol (inside symbol group)
Ivan Nevostruev
Instead of \., I find [.] to be more readable. It puts the . character into its own group.
Thomas Owens
Agreed. Although it didnt' make a difference. Both \. and [.] still say that the email passed is valid.
canadiancreed
I've just executed `var_dump(preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/', 'a@basd'));` and it prints `int(0)` which is false
Ivan Nevostruev
Yep I found the mistake on my end. My apologies for the erroneous reply earlier.
canadiancreed
[email protected] or [email protected] won't validate with that regular expression.
Mauricio
Yes, it's not perfect. But we've solved the problem
Ivan Nevostruev
A: 

A single dot in a regular expression means "match any character". And that's exactly what is does when a top level domain is missing (also when it's present, of course).

Thus you should change your code like that:

if (!preg_match('/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/', $string)) {
    return $false;
}

And by the way: a lot more characters are allowed in the local part than what your regular expression currently allows for.

innaM
Agreed on your link. I figured though that I should get this working before I start to get more involved and get in way over my head. Also tried your code. Same result, it does not require a dot and validates without it.
canadiancreed
It works for me. What input did you try?
innaM
+2  A: 

An RFC822-compliant e-mail regex is available.

ceejayoz
Will this work for PHP? I ask as it looks to be a Perl module?
canadiancreed
The Perl module just gives you an easy way of running things through that regular expression.
ceejayoz
+1  A: 

This is the most reasonable trade off of the spec versus real life that I have seen:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
@
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+
(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b

Of course, you have to remove the line breaks, and you have to update it if more top-level domains become available.

John Gietzen
+3  A: 

From the page Comparing E-mail Address Validating Regular Expressions: Geert De Deckere from the Kohana project has developed a near perfect one:

/^[-_a-z0-9\'+*$^&%=~!?{}]++(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*+@(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d++)?$/iD

But there is also a buildin function in PHP filter_var($email, FILTER_VALIDATE_EMAIL) but it seems to be under development. And there is an other serious solution: PEAR:Validate. I think the PEAR Solution is the best one.

powtac
I've ran into some limitations of the filter_Var one (unlimited top domain sizes for one) so I'll give the PEAR one a shot. Thanks!
canadiancreed
what are "unlimited top domain sizes"? It has come to my understanding that a tld can be up to 5 characters (.museum) and a domain can be up to 63 characters.
Martin Hohenberg