ansaurus

Question

MultiByteToWideChar API changes on Vista

Answer 1

A:

WCHAR *pstrRet = NULL;

int nLen = MultiByteToWideChar(CP_UTF8, 0, pstrTemp2, -1, NULL, 0);

pstrRet = new WCHAR[nLen];

int nConv = MultiByteToWideChar(CP_UTF8, 0, pstrTemp2, -1, pstrRet, nLen);

if (nConv == nLen)

{

// Success! pstrRet should be the wide char equivelant of pstrTemp2

}

if (pstrRet)

delete[] pstrRet;

I think this is way it is used it on vista found on some forum :)

Arjit 2010-08-10 10:45:42

This isn't what I was asking for. I'm asking about error handling in case of invalid characters

Artyom 2010-08-10 19:04:41

Answer 2

+2 A:

I think what it does is replacing illegal code units by the replacement character (U+FFFD), as mandated by the Unicode standard. The following code

#define STRICT
#define UNICODE
#define NOMINMAX
#define WIN32_LEAN_AND_MEAN

#include <windows.h>

#include <cstdlib>
#include <iostream>
#include <iomanip>


void test(bool ignore_illegal) {
    const DWORD flags = ignore_illegal ? 0 : MB_ERR_INVALID_CHARS;
    WCHAR buf[0x100];
    SetLastError(0);
    const int res = MultiByteToWideChar(CP_UTF8, flags, "test\xFF\xFF test", -1, buf, sizeof buf);
    const DWORD err = GetLastError();
    std::cout << "ignore_illegal = " << std::boolalpha << ignore_illegal
        << ", result = " << std::dec << res
        << ", last error = " << err
        << ", fifth code unit = " << std::hex << static_cast<unsigned int>(buf[5])
        << std::endl;
}


int main() {
    test(false);
    test(true);
    std::system("pause");
}

produces the following output on my Windows 7 system:

ignore_illegal = false, result = 0, last error = 1113, fifth code unit = fffd
ignore_illegal = true, result = 12, last error = 0, fifth code unit = fffd

So the error codes stay the same, but the length is off by two, indicating the two replacement code points that have been inserted. If you run my code on XP, the fifth code point should be U+0020 (the space character) if the two illegal code units have been dropped.

Philipp 2010-08-15 07:47:16

Thanks, that what I was looking for. Is there any mention in documentation of this feature?

Artyom 2010-08-15 10:25:15

Unfortunately not. The documentation only says that the function "does not drop illegal code points", but not what it does instead. The Unicode standard doesn't define how to treat illegal code unit sequences—it merely requires that they be not interpreted as characters, but that any legal code unit sequence must be interpreted as such. So signaling an error, deleting the offending code unit sequences or replacing them with a replacement character are legal. I think I'll add a note to the comments of the documentation page.

Philipp 2010-08-15 17:52:29

@Philipp Thank you very much once again!

Artyom 2010-08-16 06:29:22

@Philipp - hello, I had awarded the bounty. Sorry for delay, just stackoverflow changed the UI and I thought that you only need to accept the answer rather then clicking on "+XX" button. Thanks

Artyom 2010-08-21 08:22:34

ansaurus

tags:

views:

answers:

MultiByteToWideChar API changes on Vista

related questions