views:

80

answers:

3

Let's say I have a random Chinese character, 玩. I want to convert it to Unicode, which would be U+73A9. How could I do this in C#?

+2  A: 

The characater 玩 is in Unicode.

If you have it in C# as 玩, then it's currently in UTF-16, which is one of the Unicode encoding forms.

If you are obtaining it from somewhere else you need to:

  1. Find the encoding it is in.
  2. Get the bytes (wrapped by a stream is nice).
  3. Get of write an appropriate Encoder.
  4. Use the encoder to get the string (wrapping the nice stream with a textreader is nicer).

Step 3 May be simple (oh, I just use that one!) or hard (darn, have to write it myself!) or somewhere in between (hey, anyone written one of these already?!)

Jon Hanna
What I mean is I want to turn the character into U+73A9
Mass
char c = '\u73a9';
GregS
@Greg- thanks, but I want it the other way around. I want something like 玩 -> \u73a9
Mass
+3  A: 

Take myChar as a char referencing your special character...

Console.WriteLine("{0} U+{1:x4} {2}", myChar, (int)myChar, (int)myChar);

Above we're outputting the character itself followed by the Unicode code point and then the integer value.

Reduce the format string and parameters to output only the "U+..." code...

Console.WriteLine("U+{0:x4}", (int)myChar);
Allbite
Thanks, this is awesome! Could you explain the code to me though? I understand you are just writing the U+, but what is `{0:x4}`? I know one of them is some specifier, so what is `:x4`?
Mass
The 'x4' outputs it as hex (x), 4 digits zero padded on the left.
Chris
Thanks! (15 chars...)
Mass
A: 

A bit longer example, that follows the pattern in Jon Hanna's answer:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace UnicodeDecodeConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            char c = '\u73a9';
            char[] chars = {c};
            Encoding encoding = Encoding.BigEndianUnicode;
            byte[] decodeds = encoding.GetBytes(chars);
            StringBuilder stringBuilder = new StringBuilder("U+");
            foreach (byte decoded in decodeds)
            {
                stringBuilder.Append(decoded.ToString("x2"));
            }
            Console.WriteLine(stringBuilder);
            Console.ReadLine();
        }
    }
}

--jeroen

Jeroen Pluimers