tags:

views:

3426

answers:

3

I am writing a console application, which reads emails from different email boxes and processes through them. Emails are received from various automated systems. The email messages are logged and/or sent forward.

The problem is that some emails are encoded in UTF-8 and transfer-encoded in quoted-printable which messes up special characters (mainly ä,ö and å). I have not found any solution to convert them in readable format.

For example "ä" in quoted-printable is "=C3=A4". Using a normal conversion methods the result is "ä" (gibberish).

I shamelessly ripped this example conversion table from here: http://forums.sun.com/thread.jspa?threadID=5315363

char   codepoint          UTF-8 encoding                 as Latin-1

ä      11100100 = E4      11000011 10100100 = C3 A4      ä = \u00C3\u00A4
å      11100101 = E5      11000011 10100101 = C3 A5      Ã¥ = \u00C3\u00A5
ö      11110110 = F6      11000011 10110110 = C3 B6      ö = \u00C3\u00B6

Ä      11000100 = C4      11000011 10000100 = C3 84      Ã? = \u00C3\u0084
Å      11000101 = C5      11000011 10000101 = C3 85      Ã? = \u00C3\u0085
Ö      11010110 = D6      11000011 10010110 = C3 96      Ã? = \u00C3\u0096

So how do I get the real codepoint from UTF-8 value? I'd rather not use any external libraries. Besides I've tried a couple already and they failed.

+3  A: 

I'm not completely sure, but this might do the trick:

Encoding.ASCII.GetString(Encoding.UTF8.GetBytes(yourString))

I'm not on my computer right now so I can't test it, but I'll try it later.

Leandro López
A: 

You need to convert from UTF-8 to Latin1 - after doing the quoted-printable conversion.

http://msdn.microsoft.com/en-us/library/66sschk1.aspx looks promising.

Douglas Leeder
Good pointer. I'll have to check that one out too.
Leandro López
A: 

From the effects you describe, I guess you get the emails by directly connectiong to POP3 mail boxes? If so, then you get the emails in their raw form and most of those mails will most probably be in the MIME format.

MIME (Wikipedia has a good overview) is a rather large and complex standard and implementing a MIME parser that reliably handles all the cases you want to have covered could very well take you a few weeks.

I'd therefore consider using a thrid-party MIME library that does the job for you.

Andreas Huber