tags:

views:

600

answers:

2

Hi,

My code:

        string input1;

        input1 = Console.ReadLine();

        Console.WriteLine("byte output");

        byte[] bInput1 = Encoding.Unicode.GetBytes(input1);


        for (int x = 0; x < bInput1.Length; x++)
            Console.WriteLine("{0} = {1}", x, bInput1[x]);

outputs:

104 0 101 0 108 0 108 0 111 0

for the input "hello"

Is there a reference to the character map where I can make sense of this?

+9  A: 

You should read "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" at http://www.joelonsoftware.com/articles/Unicode.html

You can find a list of all Unicode characters at http://www.unicode.org but don't expect to be able to read the files there without learning a lot about text encoding issues.

Nir
+2  A: 

At http://www.unicode.org/charts/ you can find all the Unicode code charts. http://www.unicode.org/charts/PDF/U0000.pdf shows that the code point for 'h' is U+0068. (Another great tool for viewing this data is BabelMap.)

The exact details of UTF-16 encoding can be found at http://unicode.org/faq/utf_bom.html#6 and http://www.ietf.org/rfc/rfc2781.txt. In short, U+0068 is encoded (in UTF-16LE) as 0x68 0x00. In decimal, this is the first two bytes you see: 104 0.

The other characters are encoded similarly.

Finally, a great reference (when trying to understand the various Unicode specifications), apart from the Unicode Standard itself, is the Unicode Glossary.

Bradley Grainger