ansaurus

Question

Answer 1

+1 A:

This is because when you convert to string it will contain the UTF-8 BOM which are three bytes in the beginning.

Darin Dimitrov 2009-11-19 14:39:31

I noted that if you instead use 127 for the byte value, the decoded byte array contains exactly one byte, having the value 127. What happens at 128?

Fredrik Mörk 2009-11-19 14:44:37

At 128, you leave ASCII world and enter characters that change based on encoding.

jvenema 2009-11-19 14:47:05

The UTF-8 BOM is EF BB BF. This is not the case here. It is the replacement character. See my answer.

Mark Byers 2009-11-19 14:47:30

@jvenema: I am aware of that. Strangely enough, using the byte value 239 produces the same result. All other bytes in the range 128-255 outputs `False` using the OP code sample.

Fredrik Mörk 2009-11-19 14:51:44

Answer 2

+4 A:

This is invalid UTF8 byte sequence.

You need

byte[] original = new byte[] { 0xc2, 128 };

Nothing to do with byte order marks.

Update

Or preferably you should do

char[] c = { (char)128 };

leppie 2009-11-19 14:43:45

Answer 3

+4 A:

The original data is an invalid UTF8 sequence.

decoded = { 0xef, 0xbf, 0xbd }

Searching for this string turned up this: http://en.wikipedia.org/wiki/Unicode%5FSpecials. It is the UTF-8 code for the replacement character, used instead of invalid characters.

Mark Byers 2009-11-19 14:44:39

Answer 4

+1 A:

In general you can't roundtrip in this way and you are wrong to expect to be able to do so for an arbitrary encoding and in particular for any of the UTF encodings.

However there is an encoding that will allow you to roundtrip for all byte values - Latin1 aka ISO-8859-1 aka CP28591. This encoding is similar but not identical to the default Windows ANSI encoding and is useful for scenarios where roundtripping in this way is important - e.g. writing a stream that mixes text and control characters to a serial port.

See this answer, or other questions that mention Latin1.

Joe 2009-11-19 15:05:04

ansaurus

tags:

views:

answers:

System.Text.Encoding isn't

related questions