ansaurus

Question

Can someone explain Encoding.Unicode.GetBytes("hello") for me?

Answer 1

+9 A:

You should read "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" at http://www.joelonsoftware.com/articles/Unicode.html

You can find a list of all Unicode characters at http://www.unicode.org but don't expect to be able to read the files there without learning a lot about text encoding issues.

Nir 2008-11-11 15:10:38

Answer 2

+2 A:

At http://www.unicode.org/charts/ you can find all the Unicode code charts. http://www.unicode.org/charts/PDF/U0000.pdf shows that the code point for 'h' is U+0068. (Another great tool for viewing this data is BabelMap.)

The exact details of UTF-16 encoding can be found at http://unicode.org/faq/utf_bom.html#6 and http://www.ietf.org/rfc/rfc2781.txt. In short, U+0068 is encoded (in UTF-16LE) as 0x68 0x00. In decimal, this is the first two bytes you see: 104 0.

The other characters are encoded similarly.

Finally, a great reference (when trying to understand the various Unicode specifications), apart from the Unicode Standard itself, is the Unicode Glossary.

Bradley Grainger 2008-11-11 15:35:54

ansaurus

tags:

views:

answers:

Can someone explain Encoding.Unicode.GetBytes("hello") for me?

related questions