views:

971

answers:

6

Hi I was trying to output unicode string to a console with iostreams and failed.

I found this: Using unicode font in c++ console app and this snippet works.

SetConsoleOutputCP(CP_UTF8);
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize]; 
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m);

However, I did not find any way to output unicode correctly with iostreams. Any suggestions?

This does not work:

SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl;

EDIT I could not find any other solution than to wrap this snippet around in a stream. Hope, somebody has better ideas.

//Unicode output for a Windows console 
ostream &operator-(ostream &stream, const wchar_t *s) 
{ 
    int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
    char *buf = new char[bufSize];
    WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL);
    wprintf(L"%S", buf);
    delete[] buf; 
    return stream; 
} 

ostream &operator-(ostream &stream, const wstring &s) 
{ 
    stream - s.c_str();
    return stream; 
} 
A: 

You need to use a UTF-8 std::codecvt facet for iostreams, and you additionally need to call SetConsoleOutputCP(CP_UTF8);. Boost has one: http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/codecvt.html

EDIT: You could also use your existing snippet as-is and use it to output a string buffer you maintain yourself constructed from a std::wstringstream.

Billy ONeal
Can you provide a working code example? I tried using both SetConsoleOutputCP and UTF-8 codecvt with no success. It works for files but not for a console in Windows. That is why I started the thread.
Andrew
@Andrew: Unfortunately I don't as I've never had to do this myself.
Billy ONeal
A: 

If it will help you, there is some discussion here: http://www.velocityreviews.com/forums/t594301-displaying-unicode-characters-on-the-windows-console.html

lkessler
Thanks, no change :(
Andrew
A: 

I don't think there is an easy answer. looking at Console Code Pages and SetConsoleCP Function it seems that you will need to set-up an appropriate codepage for the character-set you're going to output.

call me Steve
+1  A: 

The wcout must have the locale set differently to the CRT. Here's how it can be fixed:

int _tmain(int argc, _TCHAR* argv[])
{
    char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale.
    std::locale lollocale(locale);
    setlocale(LC_ALL, locale); // Restore the CRT.
    std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT.
    std::wcout << L"¡Hola!";
    std::cin.get();
    return 0;
}

I just tested it, and it displays the string here absolutely fine.

DeadMG
Thanks for a new idea and it worked for this string but it fails for something more complicated like "¡Hola! αβγ ambulō привет :)"
Andrew
That string didn't work on wprintf for me either, just came out as a total blank. wcout got at least some of the characters right. Could you double check that wprintf gets this string right?
DeadMG
yes, if you select correct fonts for the console and start it with cmd.exe it works
Andrew
A: 

Recenly I wanted to stream unicode from Python to windows console and here is the minimum I needed to make:

  • You should set console font to the one covering unicode symbols. There is not a wide choise: Console properties > Font > Lucida Console
  • You should change the current console codepage: run chcp 65001 in the Console or use the corresponding method in the C++ code
  • write to console using WriteConsoleW

Look through an interesing article about java unicode on windows console

Besides, in Python you can not write to default sys.stdout in this case, you will need to substitute it with something using os.write(1, binarystring) or direct call to a wrapper around WriteConsoleW. Seems like in C++ you will need to do the same.

newtover
Thanks, this is essentially what I did by overloading operator -
Andrew
A: 

First, sorry I probably don't have the fonts required so I cannot test it yet.

Something looks a bit fishy here

// the following is said to be working
SetConsoleOutputCP(CP_UTF8); // output is in UTF8
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize]; 
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8
                   //     lower case %s in wprintf() is used for WideChar
printf("%s", m); // <-- does this work as well? try it to verify my assumption

while

// the following is said to have problem
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,
                     new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl; // <-- you are passing wide char.
// have you tried passing the multibyte equivalent by converting to utf8 first?
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize]; 
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
cout << m << endl;

what about

// without setting locale to UTF8, you pass WideChars
wcout << L"¡Hola!" << endl;
// set locale to UTF8 and use cout
SetConsoleOutputCP(CP_UTF8);
cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;
afriza
That is the fun part. I tried it and I was surprised that it does not work, but thanks anyways
Andrew