ansaurus

Question

Java UTF-8 to ASCII conversion with supplements

Answer 1

+1 A:

Is there some simple way to do it, without million of .replaceAll() calls?

If you just support European, Latin-based languages, around 100 should be enough; that's definitely doable: Grab the Unicode charts for Latin-1 Supplement and Latin Extended-A and get the String.replace party started. :-)

Heinzi 2010-03-30 13:02:01

I cannot believe that no one did this, made a few maps and said, here is one for people who prefer it this or that way, extend it if you wish some modifications per your needs.

bozo 2010-03-30 18:34:25

Answer 2

+1 A:

You want to use ICU4J. It includes the com.ibm.icu.text.Transliterator class, which apparently can do what you are looking for.

Thomas Pornin 2010-03-30 13:09:48

Except that the ICU4J transliterators I've tried are extremely inaccurate (latin, cyrillic and hangul), which exact transliterator do you think would fulfill the original request? I am not able to find anything apparently suitable.

jarnbjo 2010-03-30 13:56:13

I've tried ICU4J and it was so complicated that I couldn't even run it.

bozo 2010-03-30 18:33:22

ansaurus

tags:

views:

answers:

Java UTF-8 to ASCII conversion with supplements

related questions