views:

88

answers:

2
+1  Q: 

C++ encoding macro

Is there a macro that tells you what encoding C++ is using for its wchar_t type? I am currently stuck to GNU and clang. I am guessing UTF32 because my wchar_t has a size of 4 bytes. Although it could be UTF-16, it also uses 4 bytes for some code-points.

But then there is still the problem of UCS-4 or UTF-32LE or UTF-32BE.

Any help/expertise on this topic?

+2  A: 

wchar_t is implementation specific. It is not bound any specific encoding. If you are on a platform where wchar_t is 16 bits then it simply does not support UTF32 for example.

Encoding (UTF8, UTF32) and storage (wchar_t) are different things.

St3fan
Well, it supports UTF32 with `uint32_t`, but it won't give you a prebuilt string type for it. *However*, you *may* be able to make a subclass of `std::basic_string<>` that uses 32-bit storage...
Mike DeSimone
Because wchar_t is implementation-specific, you cannot use it to write cross-platform (or even cross-implementation) code. To write cross-platform code, you either have to define your own type as Mike says, or stick to char / std::string and use UTF-8.
Jon Reid
@Jon: No, that doesn't help. UTF-8 with std::string has basically the same portability problems as UTF-16 and std::wstring: you'll have to provide a lot of custom string functions. And note that I'm intentionally saying "UTF-16 and wchar_t". Yes, `wchar_t` may be 32 bits. So? `char` may be 16 bits, so `char` and UTF-8 is just as portable as `wchar_t` and UTF-16.
MSalters
@MSalters: +1, I've never been on a system where char is 16 bits, so I didn't know that it was also implementation specific!
Jon Reid
Check `CHAR_BIT` for that. You won't see this in desktop PCs. However, DSP's, small embedded controllers, GPUs, or big number crunching supercomputers might.
MSalters
A: 

There is no such macro in C++. In C99, there is macro STDC_ISO_10646 to indicate that wchar_t is Unicode. In C++, encoding of characters stored in wchar_t depends on locale and it is implementation-defined feature. In other words, you need to consult documentation of the C++ implementation you use to see see what wchar_t is associated with each locale.

mloskot