tags:

views:

213

answers:

2

I am working on Korean Document and the HTML Source Code contains special symbols starting with &#char(w) e.g 껰 Now I would like to convert this symbol to its Unicode represntation.

Is there a way to do so.

+1  A: 

First, get the codepoint by converting it to int. Then, use String.Format to obtain the Unicode code string:

string result = string.Format("\\u{0:x4}", (int) chr);

or:

string result = "\\u" + ((int) chr).ToString("x4");
Konrad Rudolph
+1  A: 

HTML uses the &# and &#x notation to encode Unicode characters. So your document already contains the charcters in one possible Unicode notation.

If the sequence starts with &#x the following characters are the hex code of the character. If the sequence starts with &# the following numbers are the decimal code of the character.

Convert these code to hex using ToString("x4") as in Konrad's answer.

devio