tags:

views:

130

answers:

2

I need to convert Strings that consists of some letters specific to certain languages (like HÄSTDJUR - note Ä) to a String without those special letters (in this case HASTDJUR). How can I do it in Java? Thanks for help!


It is not really about how it sounds. The scenario is following - you want to use the application, but don't have the Swedish keyboard. So instead of looking at the character map, you type it by replacing special letters with the typical letters from the latin alphabet.

+2  A: 

I'd suggest a mapping, of special characters, to the ones you want.

Ä --> A
é --> e
A --> A (exactly the same)
etc...

And then you can just call your mapping over your text (in pseudocode):

for letter in string:
   newString += map(letter)

Effectively, you need to create a set of rules for what character maps to the ASCII equivalent.

Noel M
I am unfortunate and don't know whether `Ä` sounds like `A` or something else. :)
Adeel Ansari
Who said anything about sounds like? This question seems to be just about removing the decorations on the letters, to put it crudely.
Noel M
May be not. I couldn't infer that from the question. Are you going on example provided? See the comments on the question, to know what I mean.
Adeel Ansari
How would you create such a table, and how would you effectively use it?
MSalters
@MSalters: That's another question. Can be done with some predefined rules, I suppose.
Adeel Ansari
@MSalters This is just one way. There are probably much better ways (1) create Map<Character,Character>table=new HashMap<Character,Character>(); table.put('Ä','A');.... (2) use Character unicode ; ... Character ascii=table.get(unicode) ;
emory
It is not really about how it sounds. The scenario is following - you want to use the application, but don't have the Swedish keyboard. So instead of looking at the character map, you type it by replacing special letters with the typical letters from the latin alphabet.
grem
+6  A: 

I think your question is the same as this one:

Java - getting rid of accents and converting them to regular letters

and hence the answer is also the same:

String convertedString = 
       Normalizer
           .normalize(input, Normalizer.Form.NFD)
           .replaceAll("[^\\p{ASCII}]", "");

See

Example Code:

final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
    Normalizer
        .normalize(input, Normalizer.Form.NFD)
        .replaceAll("[^\\p{ASCII}]", "")
);

Output:

This is a funky String

seanizer
seanizer - I need to test it but seems to be the solution.
grem