ansaurus

Question

How to replace characters in a java String?

Answer 1

+6 A:

You can make use of java.text.Normalizer and a shot of regex to get rid of the diacritics of which there exist much more than you have collected as far.

Here's an SSCCE, copy'n'paste'n'run it on Java 6:

package com.stackoverflow.q2653739;

import java.text.Normalizer;
import java.text.Normalizer.Form;

public class Test {

    public static void main(String... args) {
        System.out.println(removeDiacriticalMarks("Gračišće"));
    }

    public static String removeDiacriticalMarks(String string) {
        return Normalizer.normalize(string, Form.NFD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
    }
}

This should yield

Gracisce

At least, it does here at Eclipse with console character encoding set to UTF-8 (Window > Preferences > General > Workspace > Text File Encoding). Ensure that the same is set in your environment as well.

As an alternative, maintain a Map<Character, Character>:

Map<Character, Character> charReplacementMap = new HashMap<Character, Character>();
charReplacementMap.put('š', 's');
charReplacementMap.put('đ', 'd');
// Put more here.

String originalString = "Gračišće";
StringBuilder builder = new StringBuilder();

for (char currentChar : originalString.toCharArray()) {
    Character replacementChar = charReplacementMap.get(currentChar);
    builder.append(replacementChar != null ? replacementChar : currentChar);
}

String newString = builder.toString();

BalusC 2010-04-16 14:39:32

with this solution i get: GraA?iA¡Ae. and btw, i'd like to replace not only diacritic characters but some others of other languages too. so i really would like to know a solution that works for an arbitrary mapping.

ManBugra 2010-04-16 14:44:39

Exactly. The problem is that the diacritics are sometimes combined, sometimes not, and string character-by-character replace gets confused because there are actually two characters, not one.

Mr. Shiny and New 2010-04-16 14:46:54

@Mr. Shiny and New: yes, System.out.println("š".toCharArray().length); outputs '2'

ManBugra 2010-04-16 14:49:28

@Mr. Shiny and @ManBurga: The `.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");` should take care about removing the combining diacritical marks. Maybe you removed this line? Or you're running an ancient Java version? The above has worked fine for years here and it works for an arbitrary mapping expect of certain Polish characters such as a l with a hyphen through it, since it's not an diacritic.

BalusC 2010-04-16 14:51:52

@BalusC: java1.6 on Vista using IntelliJ IDEA, and sorry, i just cant get it working. can you please edit your post and add the imports?

ManBugra 2010-04-16 14:55:29

Done. It's by the way the IDE console which needs to be set to UTF-8. I tried to reproduce here with the console set to ISO-8859-1 and I got the same as you.

BalusC 2010-04-16 15:01:02

@BalusC: yes, console settings was f*d up. it works now. but still, i need a function for an arbitrary character mapping.

ManBugra 2010-04-16 15:08:17

I edited it in.

BalusC 2010-04-16 15:08:57

Answer 2

A:

I'd use the replace method in a simple loop.

String sourceCharacters = "šđćčŠĐĆČžŽ";
String targetCharacters = "sdccSDCCzZ";

String s = "Gračišće";
for (int i=0 ; i<sourceCharacters.length() ; i++)
    s = s.replace(sourceCharacters.charAt[i], targetCharacters.charAt[i]);

System.out.println(s);

Donal Fellows 2010-04-16 14:46:20

each iteration would create a new string object. would be nice to do it 'in place'

ManBugra 2010-04-16 14:52:13

Firstly, each iteration only makes a new object if a change is done; if the character being searched for isn't there, the original object is returned. Secondly, it's *far* more annoying to write this code using `StringBuilder` or `StringBuffer` as you have to do all the work yourself; since Java's memory management is tuned for rapid object turnover anyway, it's easier to do it the way I showed instead of trying to figure out how to be efficient. You can always optimize later if really necessary (i.e., if it is a real bottleneck).

Donal Fellows 2010-04-16 15:29:49

@Donal Fellows: yes your are right at your first point. but i dont agree with your second. you write efficient code once, even it's annoying, and than reuse it. anyway BalusC solved the riddle.

ManBugra 2010-04-16 15:47:48

ansaurus

tags:

views:

answers:

How to replace characters in a java String?

related questions