tags:

views:

290

answers:

2

How do I convert Æ and á into a regular English char with Java ? What I have is something like this : Local TV from Paraná. How to convert it to [Parana] ?

A: 

As far as I know, there's no way to do this automatically -- you'd have to substitute manually using String.replaceAll.

String str = "Paraná";
str = str.replaceAll("á", "a");
str = str.replaceAll("Æ", "a");
Kaleb Brasee
+2  A: 

Look at icu4j or the JDK 1.6 Normalizer:

public String removeAccents(String text) {
    return NNormalizer.normalize(text, Normalizer.Form.NFD)
                     .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
bmargulies
You probably meant "Normalizer.normalize(text, Normalizer.Form.NFD)" instead of "Normalizer.decompose(text, false, 0)"
Steve Emmerson
I think I accidentally put in the old sun. class scheme instead. Thanks for catching it.
bmargulies
Normalizer.Form.NFKD may be better than Normalizer.Form.NFD for his purposes, depending on how he wants to treat ligatures. eg: NFKD will transform `"fi"` into `"fi"`.
Laurence Gonsalves