views:

431

answers:

3

Hello,

I want to create some sample programs that deal with encodings, specifically I want to use wide strings like:

wstring a=L"grüßen";
wstring b=L"שלום עולם!";
wstring c=L"中文";

Because these are example programs.

This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But,straightforward compilation does not work under MSVC. I know that I can encode them using escape sequences but I would prefer to keep them as readable text.

Is there any option that I can specify as command line switch for "cl" in order to make this work? There there any command line switch like gcc'c -finput-charset

Thanks,

If not how would you suggest make the text natural for user?

Note: adding BOM to UTF-8 file is not an option because it becomes non-compilable by other compilers.

Note2: I need it to work in MSVC Version >= 9 == VS 2008

The real answer: There is no solution

+1  A: 
Kirill V. Lyadvinsky
The file is already encoded in UTF-8
Artyom
Compiler automatically converts string constants in file, so string will be stored in EXE using UCS2 encoding in result.
Kirill V. Lyadvinsky
Ok, I see, the point it that you suggest manually add "BOM" mark to UTF-8, and it works, indeed, but the problem it does not works with gcc and other compilers that do not expect meaningless BOM.
Artyom
May be, you should try UTF-16 without signature. Visual C++ supports it, what about gcc?
Kirill V. Lyadvinsky
No... Also I assume most of compilers can't
Artyom
Ok... I see that there is no solution (answered by MS). Thanks for the reference, accepting answer
Artyom
+1  A: 

For VS you can use:

#pragma setlocale( "[locale-string]" )

The default ANSI code page of the locale will be used as file encoding.

But in general is a bad idea to hard-code any user-visible strings in your code. Store them in some kind of resources. Good for localization, easy spell-checking and updating, etc.

Mihai Nita
"But in general is a bad idea to hard-code any user-visible strings in your code" I know, but this is mostly for examples where such things are important for user to see what is really happens. But how do I specify UTF-8 charset in locale string? As far as I know Windows does not support UTF-8 encoded locales.
Artyom
After short test, MSVC 2005 fails to accept `setlocale(".65001")` i.e. UTF-8 code page.
Artyom
65001 is a code page, the pragma takes a locale.There are no locales with UTF-8 as code page.If you only need it to work in VS, you can save it as UTF-16(from Notepad "Save as" and select encoding "Unicode")The only portable way to do it otherwise is to escape it as Sherwood Hu suggested. Like it or not, it is the only way.And the right way is to not hard-code it in you c file :-)
Mihai Nita
+2  A: 

IMHO all C++ source files should be in strict ASCII. Comments can be in UTF-8 if the editor supports it.

This makes the code portable across platforms, editors and source control systems.

You can use \u to put Unicode characters into a wide string:

std::wstring str = L"\u20AC123,00"; //€123,00

Sherwood Hu
Thants what I exactly **do not** want to do
Artyom