tags:

views:

1366

answers:

2

I'm looking for a portable and easy-to-use string library for C/C++, which helps me to work with Unicode input/output. In the best case, it will store its strings in memory in UTF-8, and allow me to convert strings from ASCII to UTF-8/UTF-16 and back. I don't need much more besides that (ok, a liberal license won't hurt). I have seen that C++ comes with a <locale> header, but this seems to work on wchar_t only, which may or may not be UTF-16 encoded, plus I'm not sure how good this is actually.

Uses cases are for example: On Windows, the unicode APIs expect UTF-16 strings, and I need to convert ASCII or UTF-8 strings to pass it on to the API. Same goes for XML parsing, which may come with UTF-16, but I actually only want to process internally with UTF-8 (or, for that matter, if I switch internally to UTF-16, I'll need a conversion to that anyway).

So far, I've taken a look at the ICU, which is quite huge. Moreover, it wants to be built using it own project files, while I'd prefer a library for which there is either a CMake project or which is easy to build (something like compile all these .c files, link and good to go), instead of shipping something large as the ICU along my application.

Do you know such a library, which is also being maintained? After all, this seems to be a pretty basic problem.

+2  A: 

I'd recommend that you look at the GNU iconv library.

Alnitak
iconv only gives you the ability to convert between different encodings. You don't get things like len() functions, convert case, etc.
Steve Folly
+9  A: 

UTF8-CPP seems to be exactly what you want.

Nemanja Trifunovic
Any idea how good that is? I've just taken a look at it, seems to be really simple, but I'd like to hear some opinions on it.
Anteru
Well, you won't hear any impartial opinions from me because I am the author :) However, I haven't had any open bugs for more than a year, and the people are actually using it (250-300 downloads a month) so I believe it is not that bad :)
Nemanja Trifunovic