tags:

views:

531

answers:

5

I am trying to output things like 안, 蠀, ☃ from C

#include <wchar.h>
int main()
{
    fwprintf(stdout, L"안, 蠀, ☃\n");
    return 0;
}

output is ?, ?, ?

How do I print those characters?

Edit:

#include <wchar.h>
#include <locale.h>
int main()
{
    setlocale(LC_CTYPE, "");
    fwprintf(stdout, L"안, 蠀, ☃\n");
    return 0;
}

this did the trick. output is 안, 蠀, ☃ . except that the chinese character and snowman appears as box in my urxvt probably because I did not enable those locales.

$ locale -a
C
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
ja_JP.utf8
ko_KR
ko_KR.euckr
ko_KR.utf8
korean
korean.euc
POSIX
zh_CN.utf8

which locale do I have to enable additionally so that it'll display chinese character and snowman? maybe do I need font?

will the above program work on Windows?

+1  A: 

You have to configure your system to accept those characters. What are you using? Windows, Linux?

fbinder
i'm using linux. $ locale all set to en_US.utf8
numeric
+9  A: 

You have to set your output terminal as Unicode compatible.

On Linux (with Bash shell), try:

$ LANG=en.UTF-8

and also make sure that your terminal emulator can actually display Unicode and is configured to do so.

Alnitak
+3  A: 

The C wchar_t is defined as:

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.1.1). [...]

The difference between multibyte characters and wchar_t:

multibyte characters may require more than one byte for a given character depending on the encoding (e.g: UTF-8, UTF-16)

whereas

wchar_t has a fixed size i.e. sizeof(wchar_t) which is implementation defined. Note, that this width defines what encoding(s) your wchar_t can support. So, if sizeof(wchar_t) == 2 there's no way you'd be able to use UTF-32 encoding.

Also remember that wchar_t does not have a sense of encoding by itself. You'd first have to tell the compiler what sort of encoding it has to use for wchar_t data. The erroneous output is most probably because the characters are being treated in the default encoding which can't support those characters properly and a failed match leads to a 'notdef' style '?' output.

dirkgently
wchar_t is not necessarily multibyte - it can be one byte long.
Blank Xavier
Well yes, I should have been more pedantic :-)
dirkgently
+5  A: 

There are many individual stages in the process of getting Unicode output - all of which must be correctly configured.

First, are you compiling with unicode support enabled? you will need to do so under Windows (-D UNICODE -D __UNICODE).

Second, are you emitting to a command line which supports unicode, both in principle but also having a font containing the glyphs of the characters you are emitting?

Third, do the unicode encodings used by your compiler and your command line match? it's no use having UCS2 in your binary when your command line expected UTF8.

You basically need to really understand Unicode and its encodings, to get this right. Don't imagine it's straightforward or you don't need to learn all the underlying concepts; this stuff doesn't work by accident because there are too many things which have to be exactly correct.

Blank Xavier
A: 

Just as Alnitak suggested, one has to specify a locale with a character set/encoding that includes the characters you want to show. (Unicode/)UTF-8 should cover all Unicode characters.

Your terminal should use a font that has respective glyphs.

Windows' CMD.EXE is notoriously weak when it comes to character sets beyond 8 bits. Perhaps, you'd need a GUI pane instead of relying on stdout.

eel ghEEz