views:

131

answers:

2

I am working on internationalizing the input for a C/C++ application. I have currently hit an issue with converting from a multi-byte string to wide character string.

The code needs to be cross platform compatible, so I am using mbstowcs and wcstombs as much as possible.

I am currently working on a WIN32 machine and I have set the locale to a non-english locale (Japanese).

When I attempt to convert a multibyte character string, I seem to be having some conversion issues.

Here is an example of the code:

int main(int argc, char** argv)
{
    wchar_t *wcsVal = NULL;
    char *mbsVal = NULL;

     /* Get the current code page, in my case 932, runs only on windows */
     TCHAR szCodePage[10]; 
     int cch= GetLocaleInfo( 
             GetSystemDefaultLCID(), 
             LOCALE_IDEFAULTANSICODEPAGE,  
             szCodePage,  
             sizeof(szCodePage)); 

     /* verify locale is set */
     if (setlocale(LC_CTYPE, "") == 0)
     {
        fprintf(stderr, "Failed to set locale\n");
        return 1;
     }

    mbsVal = argv[1];
         /* validate multibyte string and convert to wide character */
    int size = mbstowcs(NULL, mbsVal, 0);
    if (size == -1)
    {
        printf("Invalid multibyte\n");
        return 1;
    }
    wcsVal = (wchar_t*) malloc(sizeof(wchar_t) * (size + 1));
    if (wcsVal == NULL)
    {
        printf("memory issue \n");
        return 1;
    }

    mbstowcs(wcsVal, szVal, size + 1);
    wprintf(L"%ls \n", wcsVal);         
    return 0;
}

At the end of execution, the wide character string does not contain the converted data. I believe that there is an issue with the code page settings, because when i use MultiByteToWideChar and have the current code page sent in

EX: MultiByteToWideChar( CP_ACP, 0, mbsVal, -1, wcsVal, size + 1 ); in place of the mbstowcs calls, the conversion succeeds.

My question is, how do I use the generic mbstowcs call instead of teh MuliByteToWideChar call?

A: 

Calling mbstowcs is never as good an idea as MultiByteToWideChar on Windows. Don't bother figuring this out, just stick with the Win32 APIs.

bmargulies
+1  A: 

What do you get if you print the string returned by setlocale()? That will indicate what locale has actually been set, which may not be the one that you expect.

MSDN indicates that on Windows, the default locale chosen for "" is "the user-default ANSI code page obtained from the operating system". Perhaps this is a different beast to the current ANSI code page?

caf