views:

74

answers:

2

So given this input string:

=?ISO-8859-1?Q?TEST=2C_This_Is_A_Test_of_Some_Encoding=AE?=

And this function:

private string DecodeSubject(string input)
        {
            StringBuilder sb = new StringBuilder();
            MatchCollection matches = Regex.Matches(inputText.Text, @"=\?(?<encoding>[\S]+)\?.\?(?<data>[\S]+[=]*)\?=");
            foreach (Match m in matches)
            {
                string encoding = m.Groups["encoding"].Value;
                string data = m.Groups["data"].Value;

                Encoding enc = Encoding.GetEncoding(encoding.ToLower());
                if (enc == Encoding.UTF8)
                {
                    byte[] d = Convert.FromBase64String(data);
                    sb.Append(Encoding.ASCII.GetString(d));
                }
                else
                {                    
                    byte[] bytes = Encoding.Default.GetBytes(data);
                    string decoded = enc.GetString(bytes);
                    sb.Append(decoded);
                }
            }

            return sb.ToString();

        }

The result is the same as the data extracted from the input string. What am i doing wrong that this text is not getting decoded properly?

UPDATE

So i have this code for decoding quote-printable:

public string DecodeQuotedPrintable(string encoded)
        {
            byte[] buffer = new byte[1];
            return Regex.Replace(encoded, "=(\r\n?|\n)|=([A-F0-9]{2})", delegate(Match m)
            {
                if (byte.TryParse(m.Groups[2].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out buffer[0]))
                {
                    return Encoding.ASCII.GetString(buffer);
                }
                else
                {
                    return string.Empty;
                }
            });
        }

And that just leaves the underscores. Do i manually convert those to spaces (Replace("_"," ")), or is there something else i need to do to handle that?

+2  A: 
  1. The function's not even trying to decode the quoted-printable encoded stuff (the hex codes and underscores). You need to add that.
  2. It's handling the encoding wrong (UTF-8 gets decoded with Encoding.ASCII for some bizarre reason)
Matti Virkkunen
And, the test string is in fact in quoted-printable format.
driis
+2  A: 

Looks like you don't fully understand format of input line. Check it here: http://www.ietf.org/rfc/rfc2047.txt format is: encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

so you have to

  1. Extranct charset(encoding in terms of .net). Not just UTF8 or Default (Utf16)
  2. Extract encoding: either B for base64 Q for quoted-printable (your case!)
  3. Then perform decoding to bytes then to string
Andrey
please see updated question, thanks.
Jason Miesionczek
converting to string should be last operation. first you should convert your text: normal symbols are casted directly to bytes, =XY are cast to byte XY.
Andrey