I am working on Korean Document and the HTML Source Code contains special symbols starting with &#char(w) e.g 껰 Now I would like to convert this symbol to its Unicode represntation.
Is there a way to do so.
I am working on Korean Document and the HTML Source Code contains special symbols starting with &#char(w) e.g 껰 Now I would like to convert this symbol to its Unicode represntation.
Is there a way to do so.
First, get the codepoint by converting it to int
. Then, use String.Format
to obtain the Unicode code string:
string result = string.Format("\\u{0:x4}", (int) chr);
or:
string result = "\\u" + ((int) chr).ToString("x4");
HTML uses the &# and &#x notation to encode Unicode characters. So your document already contains the charcters in one possible Unicode notation.
If the sequence starts with &#x the following characters are the hex code of the character. If the sequence starts with &# the following numbers are the decimal code of the character.
Convert these code to hex using ToString("x4") as in Konrad's answer.