ansaurus

Question

unicode hello world for C?

Answer 1

+1 A:

You have to configure your system to accept those characters. What are you using? Windows, Linux?

fbinder 2009-04-24 21:11:12

i'm using linux. $ locale all set to en_US.utf8

numeric 2009-04-24 21:30:29

Answer 2

+9 A:

You have to set your output terminal as Unicode compatible.

On Linux (with Bash shell), try:

$ LANG=en.UTF-8

and also make sure that your terminal emulator can actually display Unicode and is configured to do so.

Alnitak 2009-04-24 21:17:41

Answer 3

+3 A:

The C wchar_t is defined as:

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.1.1). [...]

The difference between multibyte characters and wchar_t:

multibyte characters may require more than one byte for a given character depending on the encoding (e.g: UTF-8, UTF-16)

whereas

wchar_t has a fixed size i.e. sizeof(wchar_t) which is implementation defined. Note, that this width defines what encoding(s) your wchar_t can support. So, if sizeof(wchar_t) == 2 there's no way you'd be able to use UTF-32 encoding.

Also remember that wchar_t does not have a sense of encoding by itself. You'd first have to tell the compiler what sort of encoding it has to use for wchar_t data. The erroneous output is most probably because the characters are being treated in the default encoding which can't support those characters properly and a failed match leads to a 'notdef' style '?' output.

dirkgently 2009-04-24 21:19:42

wchar_t is not necessarily multibyte - it can be one byte long.

Blank Xavier 2009-04-24 21:23:25

Well yes, I should have been more pedantic :-)

dirkgently 2009-04-24 21:32:41

Answer 4

+5 A:

There are many individual stages in the process of getting Unicode output - all of which must be correctly configured.

First, are you compiling with unicode support enabled? you will need to do so under Windows (-D UNICODE -D __UNICODE).

Second, are you emitting to a command line which supports unicode, both in principle but also having a font containing the glyphs of the characters you are emitting?

Third, do the unicode encodings used by your compiler and your command line match? it's no use having UCS2 in your binary when your command line expected UTF8.

You basically need to really understand Unicode and its encodings, to get this right. Don't imagine it's straightforward or you don't need to learn all the underlying concepts; this stuff doesn't work by accident because there are too many things which have to be exactly correct.

Blank Xavier 2009-04-24 21:22:40

Answer 5

A:

Just as Alnitak suggested, one has to specify a locale with a character set/encoding that includes the characters you want to show. (Unicode/)UTF-8 should cover all Unicode characters.

Your terminal should use a font that has respective glyphs.

Windows' CMD.EXE is notoriously weak when it comes to character sets beyond 8 bits. Perhaps, you'd need a GUI pane instead of relying on stdout.

eel ghEEz 2009-04-24 22:38:05

ansaurus

tags:

views:

answers:

unicode hello world for C?

related questions