views:

5689

answers:

4

Hi, I have an email subject of the form:

=?utf-8?B?T3.....?=

The body of the email is utf-8 base64 encoded - and has decoded fine. I am current using Perl's Email::MIME module to decode the email.

What is the meaning of the =?utf-8 delimiter and how do I extract information from this string?

+12  A: 

The header is parsed as follows:

=?<charset>?<encoding>?<data>?<possibly repeated>?=

charset is in this case utf-8 the encoding is B which means base64 (the other option is Q which means Quoted Printable).

To read it, first decode the base64, then treat it as utf-8 characters.

Also read the various email related internet RFCs for more detail.

Since you are using Perl, it looks like Encode::MIME::Header will be of use:

ABSTRACT

This module implements RFC 2047 Mime Header Encoding. There are 3 variant encoding names; MIME-Header, MIME-B and MIME-Q. The difference is described below

              decode()          encode()  
MIME-Header   Both B and Q      =?UTF-8?B?....?=  
MIME-B        B only; Q croaks  =?UTF-8?B?....?=  
MIME-Q        Q only; B croaks  =?UTF-8?Q?....?=
1800 INFORMATION
+2  A: 

Check out RFC2047. The 'B' means that the part between the last two '?'s is base64-encoded. The 'utf-8' naturally means that the decoded data should be interpreted as UTF-8.

marijne
+1  A: 

This is a standard extension for charset labeling of headers, specified in RFC2047.

wnoise
+5  A: 

I think that the Encode module handles that with the MIME-Header encoding, so try this:

use Encode qw(decode);
my $decoded = decode("MIME-Header", $encoded);
moritz
That was helpful, thanks. Btw, I also used print encode('utf-8', $headers_decoded) to display decoded headers properly, if someone else is reading this while writing some mail script.
mhambra