views:

117

answers:

2

Now that ICann is allowing non-latin-character domain names, should I be concerned about e-mail validation? Currently, my sites are using php functions to ensure some alpha-numeric character set in each segment of an email address. Will these other character sets, such as Cyrillic, Arabic, and Chinese, pass validation? Are there recommended php functions to utilize for this?

A: 

I was going to recommend using filter_var() with the FILTER_VALIDATE_EMAIL filter. But after a Google search it turns out it doesn't support multi-byte characters yet. It looks like, for now, your best bet is to strip out non-latin characters and perform the usual validations against that (although checkdnsrr will obviously fail since you've changed the domain by removing the non-Latin characters and replaced them with their Latin equivalents so if you use that to verify the MX records of the email's domain then you will need to temporarily disable that).

John Conde
FILTER_VALIDATE_EMAIL also appears to be excessively strict even when dealing with non multi-byte characters.
El Yobo
+1  A: 

I think the ultimately best way would be using a proper IDN function to convert the incoming string into an ACE string (xn--xyz-blah.com). If that process works, the domain name is valid. If it doesn't, it isn't.

There is a PHP function named idn_to_ascii() that does this, but it needs additional libraries. You'd have to see whether it is available on your system.

There also seems to be an external Linux command named idn that does IDN conversions. I don't know anything further about it, though.

If you want to use PHP built-in methods only, delfuego provides a regular expression in this question that looks very good.

Pekka