views:

11

answers:

2

I work on a server that processes email, and as part of that, we do some MIME parsing/encoding. I've recently had an issue arise for a message that is valid otherwise, but contains a Latin-1 character in a MIME header. Someone entered an e-mail address to multiple recipients containing a Latin-1 character, so the SMTP envelope only contains the valid recipients, but the To line still contains the invalid address and improperly-encoded string.

It was my impression that this is illegal, and that MIME headers are required to be 7-bit. 8-bit values in MIME headers have to be encoded in the form

=?charset?encoding?encoded text?=

The header in question is something like this:

To: <changé[email protected]>, <[email protected]>

My question is: Is this valid MIME and I just don't know about it?

+1  A: 

RFC 822 says:

 address     =  mailbox                      ; one addressee
 mailbox     =  addr-spec                    ; simple address
 addr-spec   =  local-part "@" domain        ; global address
 local-part  =  word *("." word)             ; uninterpreted
 word        =  atom / quoted-string     
 atom        =  1*<any CHAR except specials, SPACE and CTLs>
 CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)

got it? your option is "quoted-string" - =?charset?encoding?encoded text?=

Andrey
Yeah, I definitely know the address is invalid, but I'm not attempting to parse the To: line itself for interpretation. I'm just wondering if it's even valid to have non-ASCII characters in a MIME header at all.
Shawn D.
@Shawn D. RFC is clear: no
Andrey
+1  A: 

From RFC2822, Internet Message Format, section 2.2, Header Fields:

Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard.

Therefore, any non-ASCII characters are illegal.

Ignacio Vazquez-Abrams