tags:

views:

69

answers:

2

I have a UTF-8 encoded char*.

Is there a standard function to calculate the number of visible characters represented by the byte array?

I'm on Red Hat (RHEL 5).

+1  A: 

Yes, glib.h, has g_utf8_strlen()

Check out this page for more information (including three implementations of an algo to do this)

Evan Carroll
The documentation you pointed to doesn't indicate whether it counts non-independent characters like combining characters or diacritics. Do you know whether it does?
Bill
I think the utf8 parlance for what you speak of is graphemes, without the use of that term I would not take it to mean combining characters or diacritics.
Evan Carroll
g_utf8_strlen() only counts code points within a UTF-8 string. Thus, if you had "e\u0341" (U+0341 is a combining acute accent), you'd get 2, while the string itself will display as é.
Thanatos
That whole second link seems to be a lesson in premature optimization.
Thanatos
I disagree with this latter remark. Library code is wholly different from application code: it ought to be as fast as possible or it'll be shunned from critical performance code. Furthermore, I would argue it's not premature in the sense that it's been thought and measured and is not the result of a misconception.
Matthieu M.
+1  A: 

Check the iconv library: man iconv_open. One can convert the utf-8 string into say UCS-2 or UCS-4 where characters are of the same size. iconv is also (relatively) portable and not Linux or GNU specific.

If Glib, suggested before, is available to you (beware: it is GPLed) then use it as it is a better way.

Dummy00001
glib is LGPL, not GPL.
Thanatos
Oh. Mea culpa. Then forget the iconv stuff.
Dummy00001