What can I do to solve this? Do I have
to do lots of additional manual
encoding? The way I understand it,
std::string does not care about the
encoding, only the bytes, so when I
pass it a unicode string and write it
to file, surely that file should
contain the same bytes and be
recognized as a UTF-8 encoded file?
You are correct that std::string
is encoding agnostic. It simply holds an array of char
elements. How these char
elements are interpreted as text depends on the environment. If your locale is not set to some form of Unicode (i.e. UTF-8 or UTF-16), then when you output a string it will not be displayed/interpreted as Unicode.
Are you sure your string literal "abcdefgàèíüŷÀ" is actually Unicode and not, for example, Latin-1? (ISO-8859-1 or possible Windows-1252)? You need to determine what locale your platform is currently configured to use.
-----------EDIT-----------
I think I know your problem: some of those Unicode characters in your charset
string literal, like the accented character "À", are two-byte characters (assuming a UTF-8 encoding). When you address the character-set string using the []
operator in your random_string
function, you are returning half of a Unicode character. Thus the random-string
function creates an invalid character string.
For example, consider the following code:
std::string s = "À";
std::cout << s.length() << std::endl;
In an environment where the string literal is interpreted as UTF-8, this program will output 2
. Therefore, the first character of the string (s[0]
) is only half of a Unicode character, and therefore not valid. Since your random_string
function is addressing the string by single bytes using the []
operator, you're creating invalid random strings.
So yes, you need to use std::wstring
, and create your charset string-literal using the L
prefix.