In Perl, I can say
my $s = "r\x{e9}sum\x{e9}";
to assign "résumé"
to $s
. I want to do something similar in C. Specifically, I want to say
sometype_that_can_hold_utf8 c = get_utf8_char();
if (c < '\x{e9}') {
/* do something */
}
In Perl, I can say
my $s = "r\x{e9}sum\x{e9}";
to assign "résumé"
to $s
. I want to do something similar in C. Specifically, I want to say
sometype_that_can_hold_utf8 c = get_utf8_char();
if (c < '\x{e9}') {
/* do something */
}
wchar_t is the type you are looking for: http://opengroup.org/onlinepubs/007908799/xsh/wchar.h.html
For UTF8, you have to generate the encoding yourself using rules found, for example, here. For example, the German sharp s (ß, code point 0xdf), has the UTF8 encoding 0xc3,0x9f. Your e-acute (é, code point 0xe9) has a UTF8 encoding of 0xc3,0xa9.
And you can put arbitrary hex characters in your strings with:
char *cv = "r\xc3\xa9sum\xc3\xa9";
char *sharpS = "\xc3\x9f";
If you have a C99 compiler you can use <wchar.h> (and <locale.h>) and enter the Unicode code points directly in the source.
$ cat wc.c
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void) {
const wchar_t *name = L"r\u00e9sum\u00e9";
setlocale(LC_CTYPE, "en_US.UTF-8");
wprintf(L"name is %ls\n", name);
return 0;
}
$ /usr/bin/gcc -std=c99 -pedantic -Wall wc.c
$ ./a.out
name is résumé