We have a current method which clears out chars that are not alphabetic or whitespace which is simply
String clean(String input)
{
return input==null?"":input.replaceAll("[^a-zA-Z ]","");
}
which really ought to be fixed to support non-english chars (e.g. ś,ũ, ... ). Unfortunately the java regex classes (e.g. "\W" -A non-word character, "\p{Alpha}" -US-ASCII only}. ) don't seem to support this. Is there a way of doing this with java regex rather than looping manually though each character to test it?