I am writing a console application, which reads emails from different email boxes and processes through them. Emails are received from various automated systems. The email messages are logged and/or sent forward.
The problem is that some emails are encoded in UTF-8 and transfer-encoded in quoted-printable which messes up special characters (mainly ä,ö and å). I have not found any solution to convert them in readable format.
For example "ä" in quoted-printable is "=C3=A4". Using a normal conversion methods the result is "ä" (gibberish).
I shamelessly ripped this example conversion table from here: http://forums.sun.com/thread.jspa?threadID=5315363
char codepoint UTF-8 encoding as Latin-1 ä 11100100 = E4 11000011 10100100 = C3 A4 ä = \u00C3\u00A4 å 11100101 = E5 11000011 10100101 = C3 A5 Ã¥ = \u00C3\u00A5 ö 11110110 = F6 11000011 10110110 = C3 B6 ö = \u00C3\u00B6 Ä 11000100 = C4 11000011 10000100 = C3 84 Ã? = \u00C3\u0084 Å 11000101 = C5 11000011 10000101 = C3 85 Ã? = \u00C3\u0085 Ö 11010110 = D6 11000011 10010110 = C3 96 Ã? = \u00C3\u0096
So how do I get the real codepoint from UTF-8 value? I'd rather not use any external libraries. Besides I've tried a couple already and they failed.