views:

58

answers:

2

I am currently making a program in which one of its functions is to extract the HTML part of a Multipart email.

I have accomplished that task fine however there is a type of encoding on some of the characters that I can't seem to figure out e.g.

',' into '=2C'
';' into '=3B'
'=' into '=3D'

and it also puts random '=' all over the place.

Does anyone know if there is a decoder for this (or even what the name of it is)? I have replaced a few things with code, however there are probably plenty more that I am missing because I haven't come across them yet. Thus, I would like to either figure out the type of encoding so I can replace everything, or find a library which has already done so.

P.s. I am sending the email from a hotmail account, if that might be the reason.

+2  A: 

This is called quoted-printable encoding.

Unfortunately, the existing QuotedPrintableStream from Microsoft is internal so that you cannot use it. However, you could take the one of the Mono project, or of any library dealing with MIME handling.

Lucero
+1  A: 

It is quoted-printable encoding, it is explained in this RFC. Let me warn you before you spend too much time on your task. Parsing emails can turn into real headache, so you should not do it yourself. try these free library, it is the best one i have ever seen (and i have seen a lot of them) http://www.lumisoft.ee/lswww/download/downloads/Net/

Andrey
Thanks for the parser, unfortunately I'm almost done :/
Immanu'el Smith
@Emmanuel Smith well... i really doubt that you are almost done, because you might underestimate the effort needed for this task. just try your parser against different email letters.
Andrey