views:

198

answers:

5
"Françoise Lefèvre"@example.com

I'm reading RFC 5321 to try to actually understand what constitutes a valid email address -- and I'm probably making this a lot more difficult than it needs to be -- but this has been bugging me.

               i.e., within a quoted string, any
               ASCII graphic or space is permitted
               without blackslash-quoting except
               double-quote and the backslash itself.

Does this mean that ASCII extended character sets are valid within quotes? Or does that imply standard ASCII table only?

EDIT - With the answers in mind, here's a simple jQuery validator that could work in supplement to the the plugin's built-in email validation to check the characters.

jQuery.validator.addMethod("ascii_email", function( value, element ) { 
    // In compliance with RFC 5321, this allows all standard printing ASCII characters in quoted text.
    // Unquoted text must be ASCII-US alphanumeric or one of the following: ! # $ % & ' * + - / = ? ^ _ ` { | } ~   
    // @ and . get a free pass, as this is meant to be used together with the email validator

    var result = this.optional(element) || 
        (
            /^[\u002a\u002b\u003d\u003f\u0040\u0020-\u0027\u002d-u002f\u0030-\u0039\u0041-\u005a\u005e-\u007e]+$/.test(value.replace(/(["])(?:\\\1|.)*?\1/, "")) &&     
            /^[\u0020-\u007e]+$/.test(value.match(/(["])(?:\\\1|.)*?\1/, ""))   
        );
    return result;
}, "Invalid characters");

The plugin's built-in validation appears to be pretty good, except for catching invalid characters. Out of the test cases listed here it only disallows comments, folding whitespace and addresses lacking a TDL (ie: @localhost, @255.255.255.255) -- all of which I can easily live without.

+2  A: 

According to this MSDN page the extended ASCII characters aren't valid, currently, but there is a proposed specification that would change this.

http://msdn.microsoft.com/en-us/library/system.net.mail.mailaddress(VS.90).aspx

The important part is here:

Thomas Lee is correct in that a quoted local part is valid in an email address and certain mail addresses may be invalid if not in a quoted string. However, the characters that others of you have mentioned such as the umlaut and the agave are not in the ASCII character set, they are extended ASCII. In RFC 2822 (and subsequent RFC's 5322 and 3696) the dtext specification (allowed in quoted local parts) only allows most ASCII values (RFC 2822, section 3.4.1) which includes values in ranges from 33-90 and 94-126. RFC 5335 has been proposed that would allow non-ascii characters in the addr-spec, however it is still labeled as experimental and as such is not supported in MailAddress.

James Black
+1  A: 

Technically yes, but read on:

While the above definition for Local-part is relatively permissive,
for maximum interoperability, a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form or where the Local-part is case- sensitive.

...

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters.

fredley
+3  A: 

In this RFC, ASCII means US-ASCII , i.e., no characters with a value greater than 127 are allowed. As a proof, here are some quotes from RFC 5321:

The mail data may contain any of the 128 ASCII character codes, [...]

[...]

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127). These characters MUST NOT be used in MAIL or RCPT commands or other commands that require mailbox names.

These quotes quite clearly imply that characters with a value greater than 127 are considered non-ASCII. Since such characters are explicitly forbidden in MAIL TO or RCPT commands, it is impossible to use them for e-mail addresses.

Thus, "Francoise Lefevre"@example.com is a perfectly valid address (according to the RFC), whereas "Françoise Lefèvre"@example.com is not.

Heinzi
A: 

It's not valid by RFC 5322 3.4 due to the è charecter

HOWEVER, these "wacky" emails are not going to work in the real world. Most email systems will either purposefully reject the email based on the idea that it's used by spammers, or be designed without knowledge of RFC. (for example, cpanel on Apache rejects proper RFC email creation).

A: 

The HTML5 spec has an interesting take on the issue of valid email addresses:

A valid e-mail address is a string that matches the ABNF production 1*( atext / "." ) "@" ldh-str 1*( "." ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.

The nice thing about this, of course, is that you can then take a look at the open source browser's source code for validating it (look for the IsValidEmailAddress function). Of course it's in C, but not too hard to translate to JS.

robertc