In VC++ 2003, I could just save the source file as UTF-8 and all strings were used as is. In other words, the following code would print the strings as is to the console. If the source file was saved as UTF-8 then the output would be UTF-8.
printf("Chinese (Traditional)");
printf("中国語 (繁体)");
printf("중국어 (번체)");
printf("Chinês (Tradicional)");
I have saved the file in UTF-8 format with the UTF-8 BOM. However compiling with VC2008 results in:
warning C4566: character represented by universal-character-name '\uC911'
cannot be represented in the current code page (932)
warning C4566: character represented by universal-character-name '\uAD6D'
cannot be represented in the current code page (932)
etc.
The characters causing these warnings are corrupted. The ones that do fit the locale (in this case 932 = Japanese) are converted to the locale encoding, i.e. Shift-JIS.
I cannot find a way to get VC++ 2008 to compile this for me. Note that it doesn't matter what locale I use in the source file. There doesn't appear to be a locale that says "I know what I'm doing, so don't f$%##ng change my string literals". In particular, the useless UTF-8 pseudo-locale doesn't work.
#pragma setlocale(".65001")
=> error C2175: '.65001' : invalid locale
Neither does "C":
#pragma setlocale("C")
=> see warnings above (in particular locale is still 932)
It appears that VC2008 forces all characters into the specified (or default) locale, and that locale cannot be UTF-8. I do not want to change the file to use escape strings like "\xbf\x11..." because the same source is compiled using gcc which can quite happily deal with UTF-8 files.
Is there any way to specify that compilation of the source file should leave string literals untouched?
To ask it differently, what compile flags can I use to specify backward compatibility with VC2003 when compiling the source file. i.e. do not change the string literals, use them byte for byte as they are.
Update
Thanks for the suggestions, but I want to avoid wchar. Since this app deals with strings in UTF-8 exclusively, using wchar would then require me to convert all strings back into UTF-8 which should be unnecessary. All input, output and internal processing is in UTF-8. It is a simple app that works fine as is on Linux and when compiled with VC2003. I want to be able to compile the same app with VC2008 and have it work.
For this to happen, I need VC2008 to not try to convert it to my local machine's locale (Japanese, 932). I want VC2008 to be backward compatible with VC2003. I want a locale or compiler setting that says strings are used as is, essentially as opaque arrays of char, or as UTF-8. It looks like I might be stuck with VC2003 and gcc though, VC2008 is trying to be too smart in this instance.