tags:

views:

733

answers:

3

Hi,

How can I read a Chinese text file using C#, my current code can't display the correct characters:

try
{    
    using (StreamReader sr = new StreamReader(path,System.Text.Encoding.UTF8))
    {
        // This is an arbitrary size for this example.
        string c = null;

        while (sr.Peek() >= 0)
        {
            c = null;
            c = sr.ReadLine();
            Console.WriteLine(c);
        }
    }
}
catch (Exception e)
{
    Console.WriteLine("The process failed: {0}", e.ToString());
}
+1  A: 

Use Encoding.Unicode instead.

I think you need to change the OutputEncoding of the Console to display it correctly.

leppie
+5  A: 

You need to use the right encoding for the file. Do you know what that encoding is? It might be UTF-16, aka Encoding.Unicode, or possibly something like Big5. Really you should try to find out for sure instead of guessing though.

As leppie's answer mentioned, the problem might also be the capabilities of the console. To find out for sure, dump the string's Unicode character values out as numbers. See my article on debugging unicode issues for more information and a useful method for dumping the contents of a string.

I would also avoid using the code you're currently using for reading a file line by line. Instead, use something like:

using (StreamReader sr = new StreamReader(path, appropriateEncoding))
{
    string line;
    while ( (line = sr.ReadLine()) != null)
    {
        // ...
    }
}

Calling Peek() requires that the stream is capable of seeking, which may be true for files but not all streams. Also look into File.ReadAllText and File.ReadAllLines if that's what you want to do - they're very handy utility methods.

Jon Skeet
A: 

If it is simplified chinese usually it is gb2312 and for the traditionnal chinese it is usually the Big5 :

// gb2312 (codepage 936) :
System.Text.Encoding.GetEncoding(936)

// Big5 (codepage 950) :
System.Text.Encoding.GetEncoding(950)
didier