views:

882

answers:

4

Quick question regarding UTF-8 support and various Win32 API's.

In a typical C++ MFC project, is it possible for MessageBox() to display a UTF-8 encoded string?

Thanks, Andrew

+5  A: 

Quick answer: No.

Longer answer: It'll work if the string only contains regular ANSI characters, e.g US English, since these character codes are the same in UTF-8 and ANSI.

If non-ANSI characters are included, or any double-byte encoded characters, you'll need to transform to Unicode-16 using MultiByteToWideChar with CP_UTF8. Your program will also need to be compiled with UNICODE defined, or you can use the 'W' API calls - e.g. MessageBoxW.

(Note that functions taking a text argument such as MessageBox, CreateWindow map to either 'A' or 'W' versions depending on whether UNICODE is defined).

This may also be of use;

http://www.joelonsoftware.com/articles/Unicode.html

Andrew Grant
Just a bit of terminology, but it's called UTF-16. There's no such thing as Unicde-16. :)
jalf
+3  A: 

Nope, use MultiByteToWideChar with CP_UTF8. See http://blogs.msdn.com/michkap/816996.aspx for why A can't do it; W (UCS-2) is the only alternative.

Mark
+1 with little nitpicking: W version is UTF-16, not UCS-2 - it handles surrogate pairs as well.
Nemanja Trifunovic
It does from XP, yes.
Mark
For more on the difference, see http://blogs.msdn.com/michkap/416552.aspx
Mark
A: 

I use the ATL/MFC string conversion macros. For example, if you have an ASCII string called myUTF8Str containing UTF8 chars:

::MessageBox(hWnd, CA2T(myUTF8Str, CP_UTF8), _T("Caption"), MB_OK);

Alternatively you can create an instance of the string, e.g.:

CA2T myConvertedString(myUTF8Str, CP_UTF8);
...
TRACE(_T("Converted: %s\n"), myUTF8Str.m_psz);

Note the m_psz member that allows read-only access to the raw string pointer.

You can also encode using CT2A, e.g.:

CT2A myEncodedString("Some UTF8", CP_UTF8);

If you don't use TEXT macros, then use CA2W, CW2A, etc.

Rob
A: 

Cool, thanks for all the help and reference material!