views:

86

answers:

1

Hi stackers!

I'm reading email from a maildir and some emails have weird sets of characters in them:

=3D
=09

I think =3D is = and =09 is a space. There are some others, but I'm not sure:

=E2
=80
=93

Does anyone know what these are and what encoding issues I'm dealing with here?

BTW, I tried fetching these email via POP3 and it's the same thing. The reason I'm posting this on SO is not because I'm using a regular mail client to read the data. I'm reading via PHP out of maildir files. Perhaps a regular email client would detect what encoding this is and deal with it.

Thanks!

+3  A: 

That looks like quoted-printable encoding.

This is a form of encoding for sending 8-bit character encodings over a medium which may not preserve the high bit - ie, they are not 8-bit clean. In the olden days, some mail servers did not preserve all 8 bits of a byte.

  • If you're seeing these in the message source but not in your email client, then this is normal.

  • If you're seeing these in your email client then something is messed up in whatever software the sender is using - most likely, the Content-Transfer-Encoding header has not been properly specified (which tells the email client how to decode it).

If you're writing an email client and want to be able to deal with this, you'll need to read the Content-Transfer-Encoding header. Of course, if you're doing that, you're also going to come up against multipart messages/attachments, base64, and much more.

thomasrutter
Thanks for the stunningly fast reply! I used quoted_printable_decode to take care of this. Basically, it's a simple script to take emails that were sent to a certain email address (that is on a mailing list from mailchimp) and turn them into html files. Works fine now. Cheers!
sims