tags:

views:

1392

answers:

3

What's the simplest way to convert a Unicode codepoint into a UTF-8 byte sequence in C? The only way that springs to mind is using iconv to map from the UTF-32LE codepage to UTF-8, but that seems like overkill.

+5  A: 

Unicode conversion is not a simple task. Using iconv doesn't seem like overkill at all to me. Perhaps there is a library version of iconv you can use to avoid make a system() call, if that's what you want to avoid.

JesperE
I was already planning on using the library.
Kevin Ballard
+1  A: 

UTF8 works by coding the length of the encoded codepoint into the highest bits of the encoded bytes. see http://en.wikipedia.org/wiki/UTF-8#Description

I found this small function in C here http://www.deanlee.cn/programming/convert-unicode-to-utf8/ , didn't test it though.

devio
+1  A: 

Might I suggest ICU? It's a reasonably "industry standard" way of handling i18n issues.

I haven't used the C version myself, but I suspect ucnv_fromUnicode might be the function you're after.

Jon Skeet
I'm not going to introduce dependencies on a new set of non-system-provided libraries just for this task. Thanks for the suggestion, though.
Kevin Ballard