views:

72

answers:

2

I am trying to get unicode working on windows in a visual studio 2k8 project, and I am not sure why I can't get my project to work. My machine has all the Eastern language support installed. I went to properties->project defaults->character set: and it is set to "Use Unicode Character Set". Here is my test code:

#include <stdio.h>
#include <string>

#define ARAB "گـگـگ گ   لـلـل ل"
#define CHINESE "大夨天太夫"
#define VALUE CHINESE
#define LARAB L"گـگـگ گ   لـلـل ل"
#define LCHINESE L"大夨天太夫"
#define LVALUE LCHINESE

void AttemptStdString(FILE* file)
{
    std::string str(VALUE);
    printf("%s: %s, length(%d)\n",__FUNCTION__,str.c_str(),str.length());
    fprintf( file, "%s = %s\n",__FUNCTION__, str.c_str() );
}    

void AttemptStdWideString(FILE* file)
{
    std::wstring str = LVALUE;
    printf("%s: %s, length(%d)\n",__FUNCTION__,str.c_str(),str.length());
    fprintf( file, "%s = %s\n",__FUNCTION__, str.c_str() );
}    

void AttemptWCharT(FILE* file)
{
    wchar_t arry[] = {0x5927,0x5928,0x5929,0x592A,0x592B,0x0000};
    printf("%s: %s\n",__FUNCTION__,arry);
    wprintf(L"%s: %s\n",__FUNCTION__,arry);
    fprintf( file, "%s = %s\n",__FUNCTION__, arry );
    fwprintf(file,L"AttemptWCharT = %s\n",arry);
}


int main()
{
    FILE* outFile = fopen( "output.txt", "w" );
    AttemptStdString(outFile);
    AttemptStdWideString(outFile);
    AttemptWCharT(outFile);
    fclose(outFile);
    return 0;
}

The results I get at the terminal are:

AttemptStdString: ?????, length(5)
AttemptStdWideString: 'Y(Y)Y*Y+Y, length(5)
AttemptWCharT: 'Y(Y)Y*Y+Y
??????T: ?????

The results is get in the file are:

AttemptStdString = ?????
AttemptStdWideString = 'Y(Y)Y*Y+Y
AttemptWCharT = 'Y(Y)Y*Y+Y
AttemptWCharT = ?????

What "voodoo" am I missing I am sure that it is something simple that will make this work, it seems like I should be able to print my characters out fine but it is failing. Also I have checked and I can paste the characters into the text editor that I am opening the file with and they display fine. And I have tried both the "Lucida Console" & "Raster Fonts" options availible to me for the visual studio terminal. Please help! What am I doing wrong?

Thank you!

+1  A: 

Windows console don't display unicode characters by default.

Klaim
Yep. The raster fonts definitely don't handle Unicode, and Lucida Console is a bastardised font that only handles *some* European language characters.
DaveE
Thank you Dave good to know, so how can I tell if my code is working with unicode properly if I can't print it out for debug?
NSA
The text display of the Windows console just sucks. *Maybe* you can select a font like DejaVu Sans Mono which contains a few more characters, but I think that the console is still completely unable to display "complex" (i.e., non-European) scripts. Your best bet is probably to redirect the standard output to a file.
Philipp
+1  A: 

The problem is not completely because of your code, it is how you look at the text. The only way that your text editor can know that the file contains Unicode is by the required BOM. You didn't write one. Use "ccs=UTF-16LE" in the _wfopen() mode string.

There's a similar problem with the console, it cannot display UTF-16 encoded characters. It only handles 8-bit characters, you'd have to use UTF-8 encoding and SetConsoleOutputCP().

Another problem is the __FUNCTION__ macro. That's still an 8-bit character string. You have to use the %hs format specifier.

Hans Passant
Hans thank you for the help, btw its "ccs=UTF-16LE" and that requires that you do the following: _wfopen(L"file.txt", L"w+,ccs=UTF-16LE");I found out the hardway that the mode "w" has to change to a "w+" and you also have to change from fopen to _wfopen.
NSA