That is, why does unsigned short var= L'ÿ'
work, but unsigned short var[]= L"ÿ";
does not?
views:
102answers:
5For what I remember of C
- 'Y' or whatever is a char and you can cast it into an int and therefore convert it into a L,
- "y" is a string constant and you can't translate it into a integer value
L'ÿ'
is of type wchar_t
, which can be implicitly converted into an unsigned short
. L"ÿ"
is of type wchar_t[2]
, which cannot be implicitly converted into unsigned short[2]
.
L
is the prefix for wide character literals and wide-character string literals. This is part of the language and not a header. It's also not GCC-specific. They would be used like so:
wchar_t some_wchar = L'ÿ';
wchar_t *some_wstring = L"ÿ"; // or wchar_t some_wstring[] = L"ÿ";
You can do unsigned short something = L'ÿ';
because a conversion is defined from wchar_t to short. There is not such conversion defined between wchar_t* and short.
wchar_t
is just a typedef
to one of the standard integer types. The compiler implementor choses such a type that is large enough to hold all wide characters. If you don't include the header, this is still true and L'ß' is well defined, only that you as a programmer don't know what type it has.
Your initialization to an integer type works because there are rules to convert one into another. Assigning a wide character string (i.e the address of the first address of a wide character array) to an integer pointer is only possible if you guess the integer type to which wchar_t
corresponds correctly. There is no automatic conversion of pointers of different types, unless one of them is void*
.
Chris has already given the correct answer, but I'd like to offer some thoughts on why you may have made the mistake to begin with. On Windows, wchar_t
was defined as 16-bit way back in the early days of Unicode where it was intended to be a 16-bit character set. Unfortunately this turned out to be a bad decision (it makes it impossible for the C compiler to support non-BMP Unicode characters in a way that conforms to the C standard), but they were stuck with it.
Unix systems from the beginning have used 32-bit wchar_t
, which of course means short *
and wchar_t *
are incompatible pointer types.