views:

46

answers:

1

This happens on OS X, though I suspect it applies to any UNIX-y OS. I have two strings that look like this:

const wchar_t *test1 = (const wchar_t *)"\x44\x00\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00";
const wchar_t *test2 = (const wchar_t *)"\x44\x00\x00\x00\x19\x20\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00";

In the debugger, test1 looks like "Ds" and test2 looks like "D's" (with the curly apostrophe). I then call this code:

wchar_t buf1[100], buf2[100];
int ret1 = swprintf(buf1, 100, L"%ls", test1);
int ret2 = swprintf(buf2, 100, L"%ls", test2);

The first swprintf call works fine. The second one returns -1 (and the buffer is unchanged).

I'm guessing the problem has something to do with locales but googling around didn't provide me with anything useful. This is the simplest way to reproduce the problem I'm seeing. What I'm really interested in is vswprintf(), but I assume that's closely related.

Why does swprintf choke on the unicode character that is outside of the 8-bit range? Is there anyway to work around this?

+3  A: 

Try explicitly set the locale to UTF-8.

setlocale(LC_CTYPE, "UTF-8");
...
const wchar_t* test2 = L"D\x2019s";
int ret2 = swprintf(buf2, 100, L"%ls", test2);
...
KennyTM
That seems to work, thanks. So... let's see if I understand what's going on here. The default locale for OS X is C, which means it uses the ANSI character set which is the same as Basic Latin. That curly apostrophe is not expressible in ANSI so the string methods refuse to deal with it. By switching to a locale that can express any unicode character, the string methods start working.
mhenry1384