tags:

views:

838

answers:

3

I asked this question a day ago regarding Greek Unicode characters, and now I have a question which builds upon that one.

After extracting all my data, I have attempted to prepare it for import into Excel. I had to chose a tab delimited file because some of my data contains commas (lucky me!).

The issue I'm running into is a very weird character after I import the data into Excel.

The column data in Notepad++ looks like this:

Total Suspended Solids @105°C

The Excel cell data looks like this:

Total Suspended Solids @105°C

I don't understand why this is happening. Does this have something to do with how the degrees symbol is represented?

p.s. I the symbols in this question are direct copy and paste

+1  A: 

I'm not absolutely sure, but I think Excel expects Windows-1252 character encoding, so make sure you create your text file using Encoding.GetEncoding("Windows-1252").

For example:

using (var writer = new StreamWriter(fileName,false,Encoding.GetEncoding("Windows-1252"))
{
 ....
}
Philippe Leybaert
+3  A: 
  1. (More likely) Excel is interpreting your textual data as latin-1 or windows-1252, and not UTF-8. "°" is what you get if you take the UTF-8 bytes for "°" (0xc2 0xb0) and interpret each byte as a character of latin-1 or windows-1252. Is there an option for input encoding when you do your import?
  2. (Less likely) Excel is doing the right thing, but you're double-encoding your data (encoding as UTF-8, then re-interpreting it as an 8-bit encoding and encoding again as UTF-8 or any other Unicode encoding). Notepad++ evidence is against this one.
hobbs
@Hobbis: OK, however, the original is in Excel, and this special "Â" character is not present
Chris
Round-tripping the data could introduce the extra character if you are converting the code page from not-UTF8 to UTF8.
fbrereto
The degree character is 0xB0 in Windows-1252, IBM-437, and ISO Latin 1. When this is encoded into UTF-8, the 0xB0 becomes 0xC2 0xB0. It looks like Excel is reading this as "ANSI" by default. When you import the data, you can tell it to use UTF-8 explicitly by selecting the "File Origin" - at least that is what it was in Excel 2003.
D.Shawley
Or, as Philippe Leybaert points out, you could just write the file with a Windows-1252 encoding instead of UTF-8, as long as you're not trying to use any characters that are unavailable in that encoding.
hobbs
A: 

Thank Philippe! Excel expects Windows-1252 character encoding! Well all right! Antonio

Antonio
Welcome at Stackoverflow! This is a Q) Please be so kind to click the `delete` link below your "answer". Also see http://stackoverflow.com/faq
BalusC