views:

551

answers:

4

Does anyone know of a great small open source Unicode handling library for C or C++? I've looked at ICU, but it seems way too big.

I need the library to support:

  • all the normal encodings
  • normalization
  • finding character types - finding if a character should be allowed in identifiers and comments
  • validation - recognizing nonsense
+4  A: 

Well, iconv is a good starting point at least.

Also, a google search returns another stackoverflow question! The horror! SO: Light C unicode library

gnud
A: 

How many features do you really need? In many cases I find converting to one type internally (e.g. UTF8) and handling the various encodings only when loading or saving is more than sufficient. If you are willing to spend a little time and write a class to handle that I'm sure you will reuse it again and again.

I have one lying around somewhere, but iirc the UTF32LE/BE is untested: http://aaq.cc/d

If your project really does need to handle various encodings other than to load/save files then you are probably best off with a library though...

jheriko
+3  A: 

UTF8-CPP was recommended in the accepted answer to a similar question: Portable and simple unicode string library for C/C++?

Pukku
+3  A: 

I looked at UT8-CPP, and libiconv, and neither seemed to have all the features I needed. So, I guess I'll just use ICU, even though it is really big. I think there are some ways to strip out the unneeded functions and data, so I'll try that. This page (under "Customizing ICU's Data Library") describes how to cut out some of the data.

Zifre
What were the features you needed that were missing? Maybe you should edit the question to say "is there a small Unicode library that does A, B and C without all the overhead of D, E and F?" Then you might find what you are looking for.
jmucchiello
There's a whole section about [Making ICU smaller][smaller] - you can even link to it statically. [smaller]: http://userguide.icu-project.org/packaging#TOC-Making-ICU-Smaller
Steven R. Loomis