views:

2080

answers:

5

I have a std::string with UTF-8 characters in it.
I want to convert the string to its closest equivalent with ASCII characters.

For example:

Łódź => Lodz
Assunção => Assuncao
Schloß => Schloss

Unfortunatly ICU library is realy unintuitive and I haven't found good documentation on its usage, so it would take me too much time to learn to use it. Time I dont have.

Could someone give a little example about how can this be done??
thanks.

+2  A: 

I don't know about ICU but ICONV does this and its quite easy to learn. it's only about 3-4 calls and what you need in your case is to use the ICONV_SET_TRANSLITERATE flag using iconvctl().

shoosh
The iconvctl function doesn't seem to be part of standard iconv implementations. At least the Linux system I am working on doesn't have it.
GetFree
iconv is not a standard. it is a library. if you don't have iconvctl, yours is broken: http://www.gnu.org/software/libiconv/
shoosh
Look at the end of this page: http://www.gnu.org/software/libiconv/documentation/libiconv/iconvctl.3.html (section "CONFORMING TO")
GetFree
A: 

This isn't an area I'm an expert in, but if you don't have a library handy that does it for you easily then you might be better of just creating a lookup table/map which contains the UTF-8 -> ASCII values. ie. The key is the UTF-8 char, the value is the ASCII sequence of chars.

OJ
Unfortunately, transliteration is a little more complicated than that.
GetFree
A: 

The ß->ss decomposition tells me you want the compatibility decomposition. In ICU, you need class Normalizer for that. Afterwards, you will end up with something like L'odz'. From this string, you can simply remove the non-ASCII characters. No need for ICU, plain STL will do.

MSalters
A: 

Try this, ucnv_convert("US-ASCII", "UTF-8", targer, targetsize, source, sourcesize, pError)

A: 

I wrote a callback that decomposes and then does some substitution. It could probably be implemented as a transliteration. code is here decompcb.c and header is nearby. Install it as follows on a Unicode-to-ASCII converter:

ucnv_setFromUCallBack(gConverter, &UCNV_FROM_U_CALLBACK_DECOMPOSE, &status);

then use gConverter to convert from unicode to ASCII

Steven R. Loomis