views:

909

answers:

2

I read the text file line by line, so far so good. I just use this:

    using (StreamReader sr = new StreamReader(this._inFilePath))

            {
                string line;
                int index = 0;

                // Read and display lines from the file until the end of 
                // the file is reached:
                while ((line = sr.ReadLine()) != null)
                {
                    //skip empty lines
                    if (line == "")
                    {
                        continue;
                    } 
}
}

It has now come to my attention that I may need to convert the file to Unicode after reading it. How is it done? Does anyone use Convert class?

A: 

You cannot convert text to Unicode after reading it, since it will already by a string by then, containing actual characters mapping to Unicode code points. What you're doing in your code example is reading the file as Unicode, since that is the default StreamReader behavior.

What makes you think you have to convert anything? Is the text corrupted?

bzlm
yes, Celsius symbol (°C) shows up as C. If i save the file I am reading in Unicode and have StreamReader param set to Unicode, the symbol shows up properly. Keep in mind that this is compact framework windows app, running on WinCE5.0, HP thin client.
gnomixa
it says it's UTF-8, which seems to behave differently in my case than Unicode.
gnomixa
Yes, UTF-8 and Unicode are not the same.
ctacke
Yes, "Unicode" is somewhat ambiguous, but usually means UTF-16 (AFAIK). The Compact Framework usually supports Unicode the same as the Bloated Framework, and where not, it's noted in the MSDN documentation.
bzlm
In my answer above, Unicode does not correspond to a Unicode encoding, but rather the mechanism in the default StreamReader behavior which attempts to determine the file encoding and reads it as Unicode either way (i.e. corruptedly if the determination fails).
bzlm
A: 

problem solved:

            using (StreamReader sr = new StreamReader(this._inFilePath, System.Text.Encoding.Default))

apparently Default corresponds to ANSI and the files produced were ANSI (not UTF-8).

THanks for your replies everyone!

gnomixa
This is not necessarily a reliable solution. Default does not always correspond to "ANSI", but rather the default locale of the system. So this would *not* work for reading back text files with Cyrillic text on a Windows machine installed with Spanish locale settings.
bzlm
As a general rule: if you create the files *yourself*, Encoding.Default is safe to use. But I wouldn't make assumptions like that.
bzlm
we are guaranteed to have exact same set up for all HP thin clients across the board. No chinese or cyrillics or spanish.
gnomixa

related questions