tags:

views:

241

answers:

4

I am in the process of learning C++ and came across an article on the MSDN here:

http://msdn.microsoft.com/en-us/magazine/dd861344.aspx

In the first code example the one line of code which my question relates to is the following:

VERIFY(SetWindowText(L"Direct2D Sample"));

More specifically that L prefix. I had a little read up, and correct me if I am wrong :-), but this is to allow for unicode strings, i.e. to prep for a long character set. Now in during my read up on this I came across another article on Adavnced String Techniques in C here http://www.flipcode.com/archives/Advanced%5FString%5FTechniques%5Fin%5FC-Part%5FI%5FUnicode.shtml

It says there are a few options including the inclusion of the header:

#define UNICODE

OR

#define _UNICODE

in C , again point out if I am wrong, appreciate your feedback. Further it shows the datatype suitable for these unicode strings being:

wchar_t

It throws into the mix a macro and a kind of hybrid datatype, the macro being:

_TEXT(t)

which simply prefixes the string with the L and the hybrid data type as

TCHAR

Which it points out will allow for unicode if the header is there and ASCII if not. Now my question is, or more of an asumption which I would like to confirm, would Microsoft use this TCHAR data type which is more flexible or is there any benefit to committing to using the wchar_t.

Also when I say does Microsoft use this, more specifically for exmaple in the ATL and WTL libraries, do anyone of yourselves have preference or have some advice regarding this?

Cheers,

Andrew

+10  A: 

For all new software you should define UNICODE and use wchar_t directly. Using ANSI stirngs will come back to haunt you.

You should just use wchar_t and the wide versions of all the CRT functions (ex: wcscmp instead of strcmp). The TEXT macros and TCHAR etc just exist if your code needs to work in both ANSI and UNICODE environments which I feel code rarely needs to do.

When you create a new windows application using Visual Studio UNICODE is automatically defined and wchar_t will work like a built-in.

obelix
+1  A: 

On Windows it's wchar_t with UTF-16 (2 bytes) encoding.

Source : http://www.firstobject.com/wchar%5Ft-string-on-linux-osx-windows.htm

Klaim
+1  A: 

TCHAR changes its type depending if UNICODE is defined, and should be used when you want code that you can compile for UNICODE and non-UNICODE.

If you want to explicitly process UNICODE data only, then feel free to use wchar_t.

Richard
+5  A: 

Short answer: the hybrid infrastructure with the TCHAR type, the _TEXT() macro and the various _t* functions (_tcscpy comes to mind) are a throwback to the times when Microsoft had two platforms coexisting:

  1. The Windows NT line was based on the Unicode string representation
  2. The Windows 95/98/ME line was based on ANSI string representation.

String representation here means that all the Windows APIs that expected or returned string to your app used one or the other representation for these strings. COM added even more confusion as it was available on both platforms -- and expected Unicode strings on both!

In those old times it was encouraged that you write "portable" code: you were instructed to use the hybrid infrastructure for your strings so that you can compile for both models just by defining/undefining UNICODE and/or _UNICODE for your app.

As the Windows9x line is no more relevant (for the vast majority of the apps anyway) you can safely ignore the ANSI world and use the Unicode strings directly.

Beware though that Unicode has multiple representations today: as it is pointed out above the Unicode convention implied by wchar_t is the UCS-2 representation (all characters encoded in 16-bit words). There are other, widely used representations where this is not necessarily true.

LaszloG