tags:

views:

46

answers:

1

Hi,

I need to compare the names of European places that are written using the extended latin alphabet - there are lots of central and eastern european names that are written with characters like 'ž' and 'ü', but some people write the names just using the regular english-latin alphabet.

I need a way to have my system recognise 'mšk žilina' and being the same as 'msk zilina', and similar for all the other accented characters used. Is there a simple way to do this?

+4  A: 

You can make use of java.text.Normalizer and a little regex to get rid of the diacritical marks.

public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

Usage example:

String text = "mšk žilina";
String normalized = removeDiacriticalMarks(text);
System.out.println(normalized); // msk zilina
BalusC
Perfect, thanks.
Oliver
You're welcome.
BalusC