views:

1673

answers:

3

Hi, I need to convert a CSV file from iso to UTF-8 to keep the accents in the database.

French accents (é,è,ê, and the like) are not kept when I try to translate them to UTF-8, they are changed to "?".

I'm stumped.

I use the following function for the translation:

public static string iso8859ToUnicode(string src) {

        Encoding iso = Encoding.GetEncoding("iso8859-1");

        Encoding unicode = Encoding.UTF8;        

        byte[] isoBytes = iso.GetBytes(src);

        byte[] unibytes = Encoding.Convert(iso,unicode,isoBytes);

        char[] unichars = new char[iso.GetCharCount(unibytes,0,unibytes.Length)];

        unicode.GetChars(unibytes,0,unibytes.Length,unichars,0);

        return new string(unichars);

    }

But it doesn't seem to work well. Help?

A: 

you might be loosing your encoding when you declare the new string, or when you store the data in the char array

MasterMax1313
I shouldn't be losing the encoding that way, as I'm converting the iso to bytes, then the bytes to utf-8... Unless there is byte-level automatic character conversion that I'm not aware of, it shouldn't be the problem.
MrZombie
A: 

Instead of the GetChars() method, can't you just call

unicode.GetString(unibytes);
Eoin Campbell
+5  A: 

I strongly suspect that your original string doesn't have the correct values. My guess is that you've read it from the file as if it were UTF-8.

To convert between two encodings, you shouldn't have the string in the first place - you should basically load the bytes of the file and call Encoding.Convert() that way. Alternatively, load the file using ISO-Latin-1 and just save it as UTF-8. For example:

public static void ConvertLatin1ToUtf8(string inputFile, string outputFile)
{
    Encoding latin1 = Encoding.GetEncoding(28591);
    string text = File.ReadAllText(inputFile, latin1);
    File.WriteAllText(outputFile, text, Encoding.UTF8);
}

or

public static void ConvertLatin1ToUtf8(string inputFile, string outputFile)
{
    Encoding latin1 = Encoding.GetEncoding(28591);
    byte[] latinBytes = File.ReadAllBytes(inputFile);
    byte[] utf8Bytes = Encoding.Convert(latin1, Encoding.UTF8, latinBytes);
    File.WriteAllBytes(outputFile, utf8Bytes);
}
Jon Skeet
Thank you a million times and a half. Is it okay for me to hate encoding issues? :P
MrZombie
Only if I can hate time zone issues more :)
Jon Skeet