tags:

views:

61

answers:

1

I have a sting in unicode is "hao123--我的上网主页", while in utf8 in C++ string is "hao123锛嶏紞鎴戠殑涓婄綉涓婚〉", but I should write it to a file in this format "hao123\uFF0D\uFF0D\u6211\u7684\u4E0A\u7F51\u4E3B\u9875", how can I do it. I know little about this encoding. Can anyone help? thanks!

+2  A: 

You seem to mix up UTF-8 and UTF-16 (or possibly UCS-2). UTF-8 encoded characters have a variable length of 1 to 4 bytes. Contrary to this, you seem to want to write UTF-16 or UCS-2 to your files (I am guessing this from the \uxxxx character references in your file output string).

For an overview of these character sets, have a look at Wikipedia's article on UTF-8 and browse from there.

Here's some of the very basic basics (heavily simplified):

  • UCS-2 stores all characters as exactly 16 bits. It therefore cannot encode all Unicode characters, only the so-called "Basic Multilingual Plane".

  • UTF-16 stores the most frequently-used characters in 16 bits, but some characters must be encoded in 32 bits.

  • UTF-8 encodes characters with a variable length of 1 to 4 bytes. Only characters from the original 7-bit ASCII charset are encoded as 1 byte.

stakx
It's that any C++ library to convert it to UTF16?
Dan
Perhaps the following will help: *What is the best unicode library for C?* (http://stackoverflow.com/questions/114611/what-is-the-best-unicode-library-for-c) and the *ustring library* (http://sourceforge.net/projects/ustring/)
stakx