What's the simplest way to convert a Unicode codepoint into a UTF-8 byte sequence in C? The only way that springs to mind is using iconv
to map from the UTF-32LE codepage to UTF-8, but that seems like overkill.
views:
1392answers:
3
+5
A:
Unicode conversion is not a simple task. Using iconv doesn't seem like overkill at all to me. Perhaps there is a library version of iconv you can use to avoid make a system() call, if that's what you want to avoid.
JesperE
2008-10-27 19:37:19
I was already planning on using the library.
Kevin Ballard
2008-10-27 19:53:03
+1
A:
UTF8 works by coding the length of the encoded codepoint into the highest bits of the encoded bytes. see http://en.wikipedia.org/wiki/UTF-8#Description
I found this small function in C here http://www.deanlee.cn/programming/convert-unicode-to-utf8/ , didn't test it though.
devio
2008-10-27 19:47:06
+1
A:
Might I suggest ICU? It's a reasonably "industry standard" way of handling i18n issues.
I haven't used the C version myself, but I suspect ucnv_fromUnicode might be the function you're after.
Jon Skeet
2008-10-27 19:53:16
I'm not going to introduce dependencies on a new set of non-system-provided libraries just for this task. Thanks for the suggestion, though.
Kevin Ballard
2008-10-27 19:54:37