ansaurus

Question

How can I embed unicode string constants in a source file?

Answer 1

+7 A:

You have to tell GCC which encoding your file uses to code those characters into the file.

Use the option -finput-charset=charset, for example -finput-charset=UTF-8. Then you need to tell it about the encoding used for those string literals at runtime. That will determine the values of the wchar_t items in the strings. You set that encoding using -fwide-exec-charset=charset, for example -fwide-exec-charset=UTF-32. Beware that the size of the encoding (utf-32 needs 32bits, utf-16 needs 16bits) must not exceed the size of wchar_t gcc uses.

You can adjust that. That option is mainly useful for compiling programs for wine, designed to be compatible with windows. The option is called -fshort-wchar, and will most likely then be 16bits instead of 32bits, which is its usual width for gcc on linux.

Those options are described in more detail in man gcc, the gcc manpage.

Johannes Schaub - litb 2009-01-14 12:26:08

Answer 2

+6 A:

A tedious but portable way is to build your strings using numeric escape codes. For example:

wchar_t *string = L"דונדארןמע";

becomes:

wchar_t *string = "\x05d3\x05d5\x05e0\x05d3\x05d0\x05e8\x05df\x05de\x05e2";

You have to convert all your Unicode characters to numeric escapes. That way your source code becomes encoding-independent.

You can use online tools for conversion, such as this one. It outputs the JavaScript escape format \uXXXX, so just search & replace \u with \x to get the C format.

fbonnet 2009-01-14 13:39:13

ansaurus

tags:

views:

answers:

How can I embed unicode string constants in a source file?

related questions