views:

297

answers:

3

Note:

/*
* Trivial code
*/
wchar_t *greeting = L"Hello World!";
char *greeting_ = "Hello World!";

WinDbg:

0:000> ?? greeting
wchar_t * 0x00415810
"Hello World!"
0:000> ?? greeting_
char * 0x00415800
"Hello World!"

0:000> db 0x00415800
00415800  48 65 6c 6c 6f 20 57 6f-72 6c 64 21 00 00 00 00  Hello World!....
00415810  48 00 65 00 6c 00 6c 00-6f 00 20 00 57 00 6f 00  H.e.l.l.o. .W.o.
00415820  72 00 6c 00 64 00 21 00-00 00 00 00 00 00 00 00  r.l.d.!.........

Question:

  • What is the purpose of the NULL character: 00 between ASCII characters in wchar_t - Win32?
+1  A: 

wchar_t is for unicode while char is for standard 8 bits ascii

in wchar_t, every character is represented on 16 bits, but "standard" characters sit on the lower half of the chart. Traditionnal chinese for example would have other values than 00 for those bytes.

Eric
+7  A: 

wchar_t is a wide-character string, so each character takes 2 bytes of storage. 'H' as a wchar_t is 0x0048. Since x86 is little-endian, you see the bytes in memory in order 48 00.

db in windbg will dump the bytes and provide how its viewed as an ASCII string, hence the H.E.L. ... output you see. You can use 'du' to dump the memory as a unicode string.

Michael
And `da` for ASCII - cool! Thanks Michael
Aaron
There is no guarantee that sizeof(wchar_t) = 2. Here (on GNU/Linux) it's 4 bytes.
Bastien Léonard
@Bastien: If you look at the tags, I am concerned with Win32 x86 only - I understand the size differs between OSs...
Aaron
@Atklin: even on the same OS, there is no guarantee that every compiler defines wchar_t he same way.
Bastien Léonard
@Bastien: this is true - YES
Aaron
+2  A: 

The answer is that wchar_t characters are 16-bit quantities, thus requiring two bytes each. Each represents a UTF-16 character. Since the letters you're using are within the ASCII range, they have values < 256, so the high byte is zero for each 2-byte pair.

Charlie