tags:

views:

181

answers:

4

When we write a program which supports both unicode and multibytes,
we often use _T("some string") macro for strings.

But, does a character also need to wrap this macro?

Are L'A' and 'A' totally same?
Don't we need to wrap _T('A') for a character?

+5  A: 

No, L'A' is a unicode char of type wchar_t while 'A' is an ASCII char of type char. Here's the MSDN on string literals.

Johannes Rudolph
'A' isn't necessarily ASCII and L'A' isn't necessarily unicode.
John Burton
Yes, L'A' isn't necessarily unicode.But I can't understand what "'A' isn't necessarily ASCII" means.
Benjamin
It means, "`'A'` (and in general the `char` type) isn't necessarily ASCII". For example, it might be EBCDIC. But not on Microsoft compilers, which is what Johannes is (reasonably IMO) talking about, on account of you mentioning `_T` in the question.
Steve Jessop
thanks Steve :)
Benjamin
+1  A: 

L'A' is a wchar_t, 'A' is a char. They are different types and also have different size.

You should use _T('A'), that adds the L to the literal if the _UNICODE macro is defined.

sergiom
L'A' is four bytes on my platform.
gnud
edit: changed assertion about wchar_t to be 2 bytes long. :)
sergiom
A: 

_T is macro from Visual Studio, if in your project's properties Character Set property set to 'Use Multi-Byte characters' _T will be replaced by empty string, if it will be set to 'Use Unicode Character Set' _T will be replaced to L. This macro is defined for simple converting project from Unicode->ASCII character set without any additional changes

zabulus
+4  A: 

If you write 'A', and that value gets converted to wchar_t, then on Microsoft compilers at least, it will have the same value as if you'd written L'A' or _T('A').

The same can't be said of string literals, since there is no useful conversion from const char* to const wchar_t*. I think this means it's rather less important to get character literal types right, than string literals.

It's easy to write code that behaves differently according to whether a character literal is wide or narrow - just have an overloaded function that does something completely different. But in practice, sensible functions overloaded to take both types of character are going to end up doing the same thing with 'A' that they do with L'A'. And functions which aren't overloaded, and only take wchar_t, can take 'A' just fine.

I don't immediately see anything in the standard to require that L'A' == (wchar_t)'A', so in theory non-Microsoft compilers might do something completely different. But you'd normally expect the wide character set to be an extension of the narrow character set, just as Unicode extends ISO-8859-1. To be specific what "extension" means, code points which are equal as integers designate the "same character".

Steve Jessop
My understanding is that having `L'A' == (wchar_t)'A'` is mandated in both C and C++. The constraint does not any longer hold in C (since TC2, some details have been modified in TC3) if `__STDC_MB_MIGHT_NEQ_WC__` is 1. C++0X imports the C TC3 solution. For C discussion see http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_321.htm
AProgrammer
Assuming your understanding is correct, I still can't actually *find* it in the C++ standard, either in the definition of the executable wide character set, the definition of wide character literals, or the definition of `wchar_t`. Doesn't mean it's not there.
Steve Jessop
The POSIX standard requires narrow and wide characters to have the same numeric value *for characters that are in the POSIX portable character set*. So, while L'A' == (wchar_t)'A' is guaranteed, L'€' == (wchar_t)'€' isn't.
dan04
I disagree that functions overloaded to take char and wchar_t should do the same thing. If I want to read a file's contents into a char string, I'll just read it. If I want to read a file's contents into a wchar_t, I have to decode it, because nobody uses UTF-16.
dan04
@dan04. You can't read a file's contents into a `wchar_t`. What I said about overloading on `char` vs. `wchar_t` applies only to what I said, not also to overloading on `char*` vs `wchar_t*`. But for example `std::isspace` should return the same for a `char` and a `wchar_t` that represent the "same character". That requirement about the POSIX-portable character set rules out a system which is EBCDIC, but uses unicode for wide chars. Hence the proposal to remove the equivalent restriction from C99, provided the implementation sets `__STDC_BTOWC_NEQ_WCTOB__`, as in AProgrammer's link above.
Steve Jessop
Anyway, by "do the same thing", I sort of mean "a function which claims to treat both types as characters via overloads, will do the same thing in the context of the function, from the user's POV". `std::cout << 'A';` and `std::cout << L'A';` don't "do the same thing", but then that's because `std::cout` is a stream of `char`, and only supports `wchar_t` because it supports everything in the universe and somehow co-erces it to one or more chars. A stream of wide characters would hopefully print `A` in both cases.
Steve Jessop