tags:

views:

349

answers:

1

The thing which I want to ask is pretty simple. I am haveing an html document which is hosted in webbrowser control.

Now when I select a word "Korean word" using the MSHTML range property I am able to get range.htmlText and range.Text both shows the "Korean word", all I want to do is to convert it to unicode format.

Is it possoble.

FYI I am doing all this using C# WinForms

+1  A: 

Could you provide a little more information? What format is the "Korean word" in when you read it? (I assume the same as the HTML document header.) Could you post a sample HTML page from which you are trying to read?

If the problem is that the string you are getting simply is in a different code page, you can use the Encoding classes in .Net to convert it. For example, perhaps your text is in iso-2022-kr. Here is a sample to convert your string, called "stringInKoreanIsoEncoding" in the code below:

Encoding koreanEncoding = Encoding.GetEncoding(50225); // 50225 is the code page for iso-2022-kr
byte[] convertedToUtf8 = Encoding.Convert(koreanEncoding, Encoding.UTF8, koreanEncoding.GetBytes(stringInKoreanIsoEncoding));
string utf8String = Encoding.UTF8.GetString(convertedToUtf8);
Dave