views:

577

answers:

5

Hi all,

I'm coding up a new (personal hobby) app for Windows in c++.

In previous low-level Windows stuff I've used _TCHAR (or just TCHAR) arrays/basic_strings for string manipulation.

Is there any advantage to using _TCHAR over straight up Unicode with wchar_t, if I don't care about Windows platforms pre Win2k?


edit: after submitting I discovered a similar question here from Oct 2008:

http://stackoverflow.com/questions/234365/is-tchar-still-relevant

Seems to be more consensus now about ditching TCHAR

+10  A: 

No there is not. Just go with wchar_t.

TCHAR is only useful if you want to be able to use a conditional compilation switch to convert your program to operate in ASCII mode. Since Win2K and up are unicode platforms this switch does not provide any value. You'd instead be implying to other developers in your project that ASCII was a valid target when in fact it's not.

JaredPar
+1. But I don't understand the comment on ASCII not being a valid target...? If you use TCHAR correctly, it is, but I agree it doesn't sound necessary in this context.
Kim Gräsman
@Kim: I think the point Jared is trying to make is that if kibibu is considering replacing TCHAR with wchar_t, his intent is for the program to be Unicode-only (since using wchar_t would lock you into Unicode). In that sense, ASCII would be an invalid target in the sense that the programmer doesn't want the program to use ASCII (using TCHAR would imply the programmer is OK with either ASCII or Unicode).
GRB
This, combined with jalf's point about interfacing with non-windows libraries, are probably the most important bits in my opinion. Just because the code says TCHAR on the tin doesn't mean it'll run or even compile in ANSI mode when interfacing with other libraries.
kibibu
+7  A: 

wchar_t is a c++ defined type, that is 16 bits in Visual Studio, but is 32bits in various gcc compilers. Its always capable of holding a unicode codepoint.

TCHAR and _TCHAR are not "unicode" characters - theyre meant to be used in code that may be compiled and/or used in Unicode OR Ansi programs:

_TCHAR is - from its leading underscore - a Microsoft C runtime library "extension" to the c++ standard. _TCHAR will, when _UNICODE is defined, be a 16bit character, when _MBCS is defined, be a multibyte character, and when neither is defined, be a singlebyte character. Use this type if you use MS CRT defined string functions that are prefixed with _t: _tcscpy() for example is the replacement for strcpy()/wcscpy().

TCHAR is a defined by the Win32 API. This type is a CHAR when UNICODE is not defined, and a WCHAR when UNICODE is defined (note the lack of an underscore on this type). Windows API functions that take strings likewise expect WCHAR strings in UNICODE builds and CHAR strings in non unicode builds.

To summarize:

  • wchar_t is a cross platform c++ defined type that can hold a unicode code point - on compilers other than Microsoft, is frequently 32bits wide.
  • _TCHAR is a microsoft defined type that is paired with _tcs* c runtime functions that changes its type based on the definition of _UNICODE and/or _MBCS.
  • TCHAR is a Windows API defined type that changes its type based on the definition (or not) of UNICODE.
  • WCHAR is the native Windows API type for dealing with unicode strings. When using a GCC toolset to build windows code, this will be 16bits wide where wchar_t might not.

Does that help? I don't know.

Chris Becke
It certainly does. Does using TCHARs with the _tcs* functions break things on MBCS platforms?
kibibu
Break things? No. As long as you ensure that _UNICODE is defined if UNICODE is and/or _MBCS is defined when UNICODE is not, then the Win32 types and the CRT "test" types will change in unison to be compatible.Win32 does not bother to define an explicit MBCS 'mode' as - their descision was not to treat multibyte from single byte character sets differently.
Chris Becke
+4  A: 

I'd go with plain wchar_t

The advantage to TCHAR is that it allows you to toggle Unicode on and off and your code accessing the Windows API will keep working.

The problem with it is that no other API will accept it.

std::cout will choke on a std::wstring, std::string will choke on being initialized with a wchar_t* and so on.

From the point of view of every other library, you should use either char or wchar_t, and switching between them is nontrivial.

And since non-Unicode compatibility was only really an issue in Windows 95, there's really no point in supporting both any more. Enable Unicode, use wchar_t and save yourself the headaches.

OF course, to avoid confusion then, you might also want to call the *W versions of Win32 functions. Instead of CreateWindow, CreateWindowW, for example, so that even if someone compiles your code with Unicode disabled in the project settings, the code will still work. If you're going to hardcode for Unicode support, you might as well do so consistently.

jalf
I hadn't considered the std::cout issue, but I don't plan on using console output anyway.
kibibu
well, that's just an example. The same issue pops up in most of the rest of the standard library. You have to use either `string` or `wstring`. There's no "tstring" unless you define it yourself. File streams too. `wifstream` versus `ifstream`. The point is that *nothing* outside the Win32 API allows for the same "might be `char`, might be `wchar_t` agnosticism, which makes it pretty pointless to use TCHAR today. It was useful when Microsoft introduced Unicode support, back in Windows 95/98. Today, it causes a number of problems, but the benefits are nonexistent.
jalf
Yeah, I meant to add that there are many other libraries that require one or the other _explicitly_, so TCHAR is more obfuscation than anything else.The question arose initially because I was looking at the Pango text lib, which uses UTF-8 strings passed as char *. Using TCHARs I'd have to scatter #ifdefs around or define a PANGO_ENCODE macro to make it work without making the TCHAR just a meaningless wrapper around wchar_t anyway.I'm going to use GDI+ instead.
kibibu
A: 

Use TCHAR. It opens your application for both UNICODE and non UNICODE deployment scenarions. Of course there is more pain with string converstions and usage of standard string manipulation functions.

dominolog
Is there a non-unicode deployment scenario I should care about now?
kibibu
A: 

If you always compile your app for Unicode while you are developing it, IT WILL NOT WORK when compiled for ANSI strings, even if it is crammed full of TCHARs. (Toy apps excepted.)

That's what JaredPar was getting at when he said ANSI is not a valid target. If you want to maintain Unicode and ANSI versions, you can use TCHAR to do that, but just using TCHAR and other T's won't get you there - you have to actively build and maintain both versions. Definitely not worth it anymore for most apps.

Mark Gilbert