views:

421

answers:

5

While testing some functions to convert strings between wchar_t and utf8 I met the following weird result with Visual C++ express 2008

std::wcout << L"élève" << std::endl;

prints out "ÚlÞve:" which is obviously not what is expected.

This is obviously a bug. How can that be ? How am I suppose to deal with such "feature" ?

+12  A: 

The C++ compiler does not support Unicode in code files. You have to replace those characters with their escaped versions instead.

Try this:

std::wcout << L"\x00E9l\x00E8ve" << std::endl;

Also, your console must support Unicode as well.

UPDATE:

It's not going to produce the desired output in your console, because the console does not support Unicode.

Dave Van den Eynde
Unfortunately, using Dave's code yields exactly the same output. So I guess it means that the shell doesn't support unicode.
chmike
It seems I should be able to activate UTF-8 support in the shell by issuing the command chcp 65001. How can I do this from within a program before writing out things ?
chmike
It's not going to output the full UTF-16. You're lucky if you get ANSI output, because the high order bytes are knocked off. But the characters are ANSI page 1252 compatible.
Dave Van den Eynde
+1  A: 

You might also want to take a look at this question. It shows how you can actually hard-code unicode characters into files using some compilers (I'm not sure what the options would be got MSVC).

jkp
A: 

You IDE and the compiler use the ANSI code page. The console uses the OEM code page.

It also matter what are you doing with those conversion functions.

Mihai Nita
A: 

This is obviously a bug. How can that be?

While other operating systems have dispensed with legacy character encodings and switched to UTF-8, Windows uses two legacy encodings: An "OEM" code page (used at the command prompt) and an "ANSI" code page (used by the GUI).

Your C++ source file is in ANSI code page 1252 (or possibly 1254, 1256, or 1258), but your console is interpreting it as OEM code page 850.

dan04