views:

56

answers:

4

I'm building a website using Django. The website could have significant users from non-English speaking countries.

I just want to know if there're any technical restrictions on what types of characters an email address could contain.

Are email addresses only allowed to contain English alphabets + numbers + "_" + "@" + "."?

Are they allowed to contain non-English alphabets like "é" or "ü"?

Can they allowed to contain Chinese or Japanese or other unicode characters?

+4  A: 

Well, yes. Read (at least) this article from Wikipedia.

I live in Argentina and here are allowed emails like ñoñó[email protected]

eKek0
+1  A: 

There is a possibility to have non-ASCII email addresses, as shown by this RFC: http://tools.ietf.org/html/rfc3490 but I think this has not been set for all countries, and from what I understand only one language code will be allowed for each country, and there is also a way to turn it into ASCII, but that won't be a trivial issue.

James Black
+2  A: 

The allowed syntax in an email address is described in RFC 3696, and is pretty involved.

The exact rule [for local part; the part before the '@'] is that any ASCII character, including control characters, may appear quoted, or in a quoted string. When quoting is needed, the backslash character is used to quote the following character
[...]
Without quotes, local-parts may consist of any combination of alphabetic characters, digits, or any of the special characters ! # $ % & ' * + - / = ? ^ _ ` . { | } ~
[...]
Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications...

...and so on, in some depth.

Michael Petrotta
+3  A: 

Instead of worrying about what email addresses can and can't contain, which you really don't care about, test whether your setup can send them email or not—this is what you really care about! This means actually sending a verification email.

Otherwise, you can't catch a much more common case of accidental typos that stay within any character set you devise. (Quick: is [email protected] a valid address for me to use at your site, or not?) It also avoids unnecessarily and gratuitously alienating any users when you tell them their perfectly valid and correct address is wrong. You still may not be able to process some addresses (this is necessary alienation), as the other answers say: email address processing isn't trivial; but that's something they need to find out if they want to provide you with an email address!

All you should check is that the user supplies some text before an @, some text after it, and the address isn't outrageously long (say 1000 characters). If you want to provide a warning ("this looks like trouble! is there a typo? double-check before continuing"), that's fine, but it shouldn't block the add-email-address process.

Of course, if you don't care to ever send email to them, then just take whatever they enter. For example, the address might solely be used for Gravatar, but Gravatar verifies all email addresses anyway.

Roger Pate