views:

1235

answers:

3
+7  Q: 

UTF-8 in Windows

How do I set the code page to UTF-8 in a C Windows program?

I have a third party library that has uses fopen to open files. I can use wcstombs to convert my Unicode filenames to the current code page, however if the user has a filename with a character outside the code page then this breaks.

Ideally I would just call _setmbcp(65001) to set the code page to UTF-8, however the MSDN documentation for _setmbcp states that UTF-8 is not supported.

How can I get around this?

A: 

Try setting the code page with a #pragma

Also can you add some detail? If I understand correctly you have a third party library that you CANT change that has a function that takes a const char string and you want to be able to pass it a Unicode string ?

Roman M
+2  A: 

All Windows APIs think in UTF-16, so you're better off writing a wrapper around your library that converts at the boundaries.

Oddly enough, Windows thinks UTF-8 is a codepage for the purposes of conversion, so you use the same APIs as you would to convert between codepages:

std::wstring Utf8ToUtf16(const char* u8string)
{
    int wcharcount = strlen(u8string);
    wchar_t *tempWstr = new wchar_t[wcharcount];
    MultiByteToWideChar(CP_UTF8, 0, u8string, -1, tempWstr, wcharcount);
    wstring w(tempWstr);
    delete [] tempWstr;
    return w;
}

And something of similar form to convert back.

Ben Straub
+7  A: 

Unfortunately, there is no way to make Unicode the current codepage in Windows. The CP_UTF7 and CP_UTF8 constants are pseudo-codepages, used only in MultiByteToWideChar and WideCharToMultiByte conversion functions, like Ben mentioned.

Your problem is similar to that of the fstream C++ classes. The fstream constructors accept only char* names, making impossible to open a file with a true Unicode name. The only solution offered by VC was a hack: open the file separately and then set the handle to the stream object. I'm afraid this isn't an option for you, of course, since the third party library probably doesn't accept handles.

The only solution I can think of is to create a temporary file with a non-Unicode name, which is hard-linked to the original, and use that as a parameter.

efotinis