Light C Unicode Library

views:

1301

answers:

+8 Q:

Light C Unicode Library

Im looking for a small C library to handle utf8 strings.

Specifically, splitting based on unicode delimiters for use with stemming algorithms.

Related posts have suggested:

ICU http://www.icu-project.org/ (I found it too bulky for my purposes on embedded devices)

UTF8-CPP: http://utfcpp.sourceforge.net/ (Excellent, but C++ not C)

Has anyone found any platform independant, small codebase libraries for handling unicode strings (doesnt need to do naturalisation).

Any advice would be appreciated.

+8 A:

A nice, light, library which I use successfully is utf8proc.

Avi 2008-11-24 06:52:10

Cheers, its just what I was looking for.

Akusete 2008-11-24 07:27:24

+1 A:

UTF-8 is specially designed so that many byte-oriented string functions continue to work or only need minor modifications.

C's strstr function, for instance, will work perfectly as long as both its inputs are valid, null-terminated UTF-8 strings. strcpy works fine as long as its input string starts at a character boundary (for instance the return value of strstr).

So you may not even need a separate library!

Artelius 2008-11-24 07:30:02

Very True, until now I had only needed to store/copy strings and was doing just that. But then I started needing to split/stem words for indexing so I wanted to make sure I was dealing with them properly.

Akusete 2008-11-24 07:33:13

ansaurus

tags:

views:

answers:

Light C Unicode Library

related questions