tags:

views:

337

answers:

3

Our website has files in a few different languages - French, Spanish, Portuguese, and English. When a user uploads a file that contains special characters like ó or ç or ã etc i get an error message when i return File(data, "application/octet-stream", name); in MVC i get the exception:

System.FormatException: An invalid character was found in the mail header.

I found an article in MSDN for this showing how to set the mailmessage to UTF-8 encoding to avoid this. But i do not know how to UTF-8 encode the filename when using the MVC file actionresult. I found an article on the net to UTF-8 encode a string but when I try to use it I get a garbage name so I guess I do not understand what UTF-8 encoding is supposed to do to the string. Here is the sample code found in this blog post: An invalid character was found in the mail header

 public static string GetCleanedFileName(string s)
    {

        char[] chars = s.ToCharArray();

        var sb = new StringBuilder();

        for (int index = 0; index < chars.Length; index++)
        {
            string encodedString = EncodeChar(chars[index]);
            sb.Append(encodedString);
        }
        return sb.ToString();
    }


    private static string EncodeChar(char chr)
    {

        var encoding = new UTF8Encoding();

        var sb = new StringBuilder();

        byte[] bytes = encoding.GetBytes(chr.ToString());

        for (int index = 0; index < bytes.Length; index++)
        {
            sb.AppendFormat("%{0}", Convert.ToString(bytes[index], 16));
        }
        return sb.ToString();
    } 
A: 
TGadfly
This converts perfectly to utf and back to a string with the characters that cause the exception exactly as they were before conversion. So i still have the issue because the exception occurs when you try to pass the filename to the file actionresult with those characters
Kevin McPhail
So this "Representação França - Cialne - Rhodimet NP 99 - PR.pdf" becomes:82 101 112 114 101 115 101 110 116 97 195 167 195 163 111 32 70 114 97 110 195 167 97 32 45 32 67 105 97 108 110 101 32 45 32 82 104 111 100 105 109 101 116 32 78 80 32 57 57 32 45 32 80 82 46 112 100 102 Which is not useable if i return that as the filename and if i convert and convert back i am right back where i was with the first file name and the invalid characters.
Kevin McPhail
A: 

I think i have got an idea you have to convert your string not to utf-8 but to utf-16 because utf-8 is encripted ascii as i think.

UTF-16 represents every character using two bytes. UTF-8 uses the one byte ASCII character encodings for ASCII characters and represents non-ASCII characters using variable-length encodings. Keep in mind that while UTF-8 can save space for Western languages, which is an argument often used by proponents, it can actually use up to three bytes per character for other languages.

And that symbols you wrote are not ASCII

TGadfly
A: 

The garbage string you are referring to is the encoded version of the characters. You should only see the "garbage" version in the response. When saved to disk the OS should resolve the encoding, and actually display the correct character.

When were are streaming the file down to the client, the code looks something like this:

public ActionResult SaveFile(string fileName)
{
    string fileName = string.Empty; 
    var content=GetContent();
    fileName = CleanFileName(fileName);

    MemoryStream ms = new MemoryStream();
    StreamWriter writer = new StreamWriter(ms, Encoding.UTF8);

    writer.Write(content);
    return File(ms, "text/csv", fileName);
 }

When I run try to save a file called ó or ç or ã you will see %c3%b3%20%6f%72%20%c3%a7%20%6f%72%20%c3%a3 in the response.

However, when the file is saved to disk this is what the OS displays ó or ç or ã

Joe