tags:

views:

256

answers:

7

Will this email validation allow foreign email address like russian, hebrew and so on? And how can I just check for the @ sign?

Here is the php code.

if (preg_match ('/^[\w.-]+@[\w.-]+\.[A-Za-z]{2,6}$/', $_POST['email'])) {
    $email = mysqli_real_escape_string($mysqli, strip_tags($_POST['email']));
} else {
    echo '<p class="error">Please enter a valid email address!</p>';
}
+4  A: 

Short answer: There is no good regular expression for this problem.

Ryan Bigg
+2  A: 

No it will not allow email addresses that have foreign char.

To check only for @ in a string you can use strpos:

if(strpos($str,'@') !== false) 
  echo "$str has @";
codaddict
+2  A: 

It's generally a bad idea to regex check email addresses.

Sure you can use some super complicated expression that catches 99% of invalid addresses, but at the end of a day, if a user doesn't want to enter a proper email address, they'll just enter a perfectly valid but non existent address.

Charlie Somerville
+1  A: 

Firstly, when your talking about email addresses at this level, you need to be a lot more specific about the data you are talking about. You seem to be trying to match a single ADDR-SPEC (RFC3696,5322). According to the RFC, the following is an email address:

 "hello world" <[email protected]>, !$&*-=^`|~#%'+/?_{}@example.com, "Abc@def"@example.com

Consisting of 3 ADDR-SPECs.

Note that there are lots of local part characters which won't match your regex. Your domain matching is too loose. You've not escaped the '.'s in your regex - which is the only reason it can cope with a domain using more than 2 parts e.g. @mail.something.co.jp

RFC4952 suggests using UTF8 for all SMTP, however internationalized domain names whereas RFC3940 uses punycode for internationalized domain names. When last I dug into this in any depth I found that many MUA's were using punycode for the local part.

The following regex comes close to fully implementing the RFC2822 standard:

 /[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/gi

But in practice I've found this worksd for me:

/^[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+@([a-z0-9\-]+\.){1,}([a-z]{2,6})$/gi

C.

symcbean
Nice and educated approach
Álvaro G. Vicario
+1  A: 

Instead of trying to write your own regular expression or a string tokenizer, you can simply use PHP's filter_var function (assuming you're using PHP >= 5.2):

<?php
// this will set $email to '[email protected]'
$email = filter_var('[email protected]', FILTER_VALIDATE_EMAIL);

// this will set $email to bool(false)
$email = filter_var('example.com', FILTER_VALIDATE_EMAIL);
?>

With this function you can also check other values, simply check the manual for other constants.

Cassy
+1  A: 

E-mail validation is actually pretty tricky.

For example foo\@[email protected] and test/[email protected] are valid email addresses. Most of preg_match patterns will fail with addresses like this.

You need function that checks for "@" sign and ignores escaped "@" signs, also performs specific strlen functions. Furthermore you have to check if local part of email contains two following dots ".." ([email protected] is invalid email) or if it starts or ends with dot ([email protected] is also invalid) and so on...

What I'm trying to say is that email validation isn't just two lines of code.

Mikk
A: 

I normally use this expression:

preg_match('/^[a-z0-9\._-]+@[a-z0-9\._-]+\.(xn--)?[a-z0-9]{2,}$/i', $email)

It's not very strict (it only seeks to find obvious mistypes) and it supports internationalized domain names, given that they're using the plain ASCII representation (such us asxn--lvaro-wqa.es rather than álvaro.es). It should not be difficult to get the plain ASCII version of the e-mail address. The algorithm is called Punycode and there's a PEAR package that claims to handle it.

Most answers say you should not validate e-mail addresses. I don't agree with that. All piece of data is subject to some sort of validation. People make mistakes when typing. What you not be done is using regular expressions to make sure useres are not faking addresses; that's pointless and will eventually annoy legitimate users. There're other tools for that.

Update

Symcbean's excellent answer has made me realise that my regexp is probably excluding valid address; it's too restrictive in the account name itself.

Álvaro G. Vicario