views:

109

answers:

2

I am using a hidden RichTextBox to retrieve Text property from a RichEditCtrl. rtb->Text; returns the text portion of either English of national languages – just great!

But I need this text in \u12232? \u32232? instead of national characters and symbols. to work with my db and RichEditCtrl. Any idea how to get from “пассажирским поездом Невский” to “\u12415?\u12395?\u23554?\u20219?\u30456?\u35527?\u21729? (where each national character is represented as “\u23232?”

If you have, that would be great. I am using visual studio 2008 C++ combination of MFC and managed code.

Cheers and have a wonderful weekend

A: 

If you need a System::String as an output as well, then something like this would do it:

String^ s = rtb->Text;
StringBuilder^ sb = gcnew StringBuilder(s->Length);
for (int i = 0; i < s->Length; ++i) {
    sb->AppendFormat("\u{0:D5}?", (int)s[i]);
}
String^ result = s->ToString();

By the way, are you sure the format is as described? \u is a traditional Escape sequence for a hexadecimal Unicode codepoint, exactly 4 hex digits long, e.g. \u0F3A. It's also not normally followed by ?. If you actually want that, format specifier {0:X4} should do the trick.

Pavel Minaev
thanks Pavel. I will try your suggestion tonight.As for the format, those \u12395?\u23554? etc. are UNICODE hex as you correctly pointed out. I cut and pasted an example from my debug output - so the "?" really follows the \u with 5 or 3 digits depending on language.Whare are you from?
val
If they are hex, there shouldn't be 5 digits in them, as `\u` only permits four (and requires exactly four) in C++ and C#. What "debug output" did you cut?
Pavel Minaev
Pavel you are write about the 5 digits for asian languages like Chinese (\u20219?\u30456?\u35527?) or Russian (\u1086?\u1084?\u1099?) in my example above. As I get from Greek, it's seems only 3 digits available as here: "\u957? \u946?\u959?" All of my examples are snippets only not full sentences.I am having the very first project in internaltional languages and having lots of fun;-) I am just back from University after having a long day and a few beers. I'll try the stuff tomorrow on a fresh head. Privet
val
A: 

You don't need to use escaping to put formatted Unicode in a RichText control. You can use UTF-8. See my answer here: Unicode RTF text in RichEdit.

I'm not sure what your restrictions are on your database, but maybe you can use UTF-8 there too.

asveikau
Wow, great answer article! If I got it right, your PSTR Utf8; will look like "\u12395?\u23554?\u20219?" when PWSTR WideString = "Сотрудники главного управления МЧС ".Am I correct?
val
The way UTF-8 works, each WCHAR value that is > 128 will be represented as anywhere from 2 to 4 CHAR values... For example L"д" will be "\xd0\xb4". You can read more about how it works at the Wikipedia article for UTF-8.
asveikau
Thanks again buddy, you've helped a lot. Cheers, I need a break after a party with my friends at universcity tonight. I'll try to fix my code tomorrow ;-)
val