views:

963

answers:

2

For example: "½" or ASCII DEC 189. When I read the bytes from a text file the byte[] contains the valid value, in this case 189.

Converting to Unicode results in the Unicode replacement character 65533.

UnicodeEncoding.Unicode.GetString(b);

Converting to ASCII results in 63 or "?"

ASCIIEncoding.ASCII.GetString(b);

If this isn't possible what is the best way to handle this data? I'd like to be able to perform string functions like Replace().

+4  A: 

It depends on exactly what the encoding is.

There's no such thing as "ASCII 189" - ASCII only goes up to 127. There are many encodings which a 8-bit encodings using ASCII for the first 128 values.

You may want Encoding.Default (which is the default encoding for your particular system), but it's hard to know for sure. Where did your data come from?

Jon Skeet
What I'm reading into the byte[] lines up with 188 - 190 in this extended ascii chart: http://charlie.balch.org/asp/ascii.asp.Encoding.Default did the trick. Thanks a bunch!
rtremaine
Glad it worked - just be aware that anyone who talks about "extended ASCII" as if that means one particular encoding doesn't know what they're talking about. It's like talking about "one dollar" - one US dollar, Australian dollar, Canadian dollar, what? It may make sense in a particular context
Jon Skeet
but it isn't a definitive and unique idea. So I dare say Charlie's idea of "extended ASCII" is appropriate for *his* culture - but it wouldn't match what happens on some other people's computers.
Jon Skeet
+2  A: 

Byte 189 represents a "½" in iso-8859-1 (aka "Latin-1"), so the following is maybe what you want:

var e = Encoding.GetEncoding("iso-8859-1");
var s = e.GetString(new byte[] { 189 });

All strings and chars in .NET are UTF-16 encoded, so you need to use an encoder/decoder to convert anything else, sometimes this is defaulted (e.g. UTF-8 for FileStream instances) but good practice is to always specify.

You will need some form of implicit or (better) explicit metadata to supply you with the information about which encoding.

Richard