views:

1611

answers:

8

Given a word, I've to replace some specific alphabets with some specific letters such as 1 for a, 5 for b etc. I'm using regex for this. I understand that StringBuilder is the best way to deal with this problem as I'm doing a lot of string manipulations. Here is what I'm doing:

String word = "foobooandfoo";
String converted = "";
converted = word.replaceAll("[ao]", "1");
converted = converted.replaceAll("[df]", "2");
converted = converted.replaceAll("[n]", "3");

My problem is how to rewrite this program using StringBuilder. I tried everything but I can't succeed. Or using String is just fine for this?

+1  A: 

I don't know if StringBuilder is the tool for you here. I'd consider looking at Matcher which is part of the java regex package and might be faster than your example above in case you really need the performance.

auramo
A: 

I had a look at the Matcher.replaceAll() and I noticed that it returns a String. Therefore, I think that what you've got is going to be plenty fast. Regex's are easy to read and quick.

Remember the first rule of optimization: don't do it!

sblundy
+1  A: 

I'd actually say that the code is pretty OK in most applications although it's theoretically inferior to other methods. If you don't want to use the Matcher, try it like this:

StringBuilder result = new StringBuilder(word.length());

for (char c : word.toCharArray()) {
    switch (c) {
        case 'a': case 'o': result.append('1'); break;
        case 'd': case 'f': result.append('2'); break;
        case 'n': result.append('3'); break;
        default: result.append(c); break;
    }
}
Konrad Rudolph
Unfortunately for (char c : word) doesn't seem to work (at least not with my javac). Oh for C#...
Jon Skeet
(Other than that, our solutions are basically identical :)
Jon Skeet
Err, for (char c : word.toCharArray()) works perfectly fine here in Java 5. Which has been out for years...
JeeBee
The original code didn't have the "toCharArray" in.
Jon Skeet
(If we're looking at performance, I'd expect explicit iteration to be faster than copying the string to a new char array.)
Jon Skeet
Thanks to sblundy for correcting my code and yay for StackOverflow to be a wiki!
Konrad Rudolph
A: 

I don't believe you can. All the regex replace APIs use String instead of StringBuilder.

If you're basically converting each char into a different char, you could just do something like:

public String convert(String text)
{
    char[] chars = new char[text.length()];
    for (int i=0; i < text.length(); i++)
    {
        char c = text.charAt(i);
        char converted;
        switch (c)
        {
            case 'a': converted = '1'; break;
            case 'o': converted = '1'; break;
            case 'd': converted = '2'; break;
            case 'f': converted = '2'; break;
            case 'n': converted = '3'; break;
            default : converted = c; break;
        }
        chars[i] = converted;
    }
    return new String(chars);
}

However, if you do any complex regular expressions, that obviously won't help much.

Jon Skeet
+6  A: 

I think this is a case where clarity and performance happily coincide. I would use a lookup table to do the "translation".

  public static void translate(StringBuilder str, char[] table)
  {
    for (int idx = 0; idx < str.length(); ++idx) {
      char ch = str.charAt(idx);
      if (ch < table.length) {
        ch = table[ch];
        str.setCharAt(idx, ch);
      }
    }
  }

If you have a large alphabet for the str input, or your mappings are sparse, you could use a real map, like this:

  public static void translate(StringBuilder str, Map<Character, Character> table)
  {
    for (int idx = 0; idx < str.length(); ++idx) {
      char ch = str.charAt(idx);
      Character conversion = table.get(ch);
      if (conversion != null) 
        str.setCharAt(idx, conversion);
    }
  }

While these implementations work in-place, you could create a new StringBuilder instance (or append to one that's passed in).

erickson
Clarity is in the eye of the beholder, my friend. I find the code in the question more easily understood.
sblundy
True, it's a matter of opinion. With three rules, it's not a problem, but I'd have a hard time spotting a bug in a long list of regex character classes, embedded in a lot of boilerplate.
erickson
This is by far the best solution. Anyone who could code this without using a map (the switch statements) shouldn't be calling themselves software engineers. Your code should NOT have data in it (not even '1', "1" or 'a')
Bill K
A: 

I understand that StringBuilder is the best way to deal with this problem as I'm doing a lot of string manipulations.

Who say that to you? The best way is those that is more clear to read, to the one that uses StringBuilder. The StringBuilder is some circumnstances but in many does not provide a percetible speed up.

You shouldn't initialize "converted" if the value is always replaced.

You can remove some of the boiler plate to improve your code:

String word = "foobooandfoo";
String converted = word.replaceAll("[ao]", "1")
                       .replaceAll("[df]", "2")
                       .replaceAll("[n]", "3");

If you want use StringBuilder you could use this method

java.util.regex.Pattern#matcher(java.lang.CharSequence)

which accept CharSequence (implemented by StringBuilder). See http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#matcher(java.lang.CharSequence).

Andrea Francia
A: 

I would NOT recommend using any regex for this, those are actually all painfully slow when you're doing simple operations. Instead I'd recommend you start with something like this

// usage:
Map<String, String> replaceRules = new HashMap<String, String>();
replaceRules.put("ao", "1");
replaceRules.put("df", "2");
replaceRules.put("n", "3");
String s = replacePartsOf("foobooandfoo", replaceRules);

// actual method
public String replacePartsOf(String thisString, Map<String, String> withThese) {
    for(Entry<String, String> rule : withThese.entrySet()) {
     thisString = thisString.replaceAll(rule.getKey(), rule.getValue());
    }

    return thisString;
}

and after you've got that working, refactor it to use character arrays instead. While I think what you want to do can be done with StringBuilder it most likely won't be worth the effort.

P Arrayah
String.replaceAll actually uses regular expressions. So the call to it in the for loop is not equivalent to the code supplied by the original poster. See the Javadoc.
laz
I merely meant for him to follow the logic of the snippet above, not to use it exactly. I did say he should "refactor it[the code] to use character arrays instead".
P Arrayah
A: 

StringBuilder vs. regex is a false dichotomy. The reason String#replaceAll() is the wrong tool is because, each time you call it, you're compiling the regex and processing the whole string. You can avoid all that excess work by combining all the regexes into one and using the lower-level methods in Matcher instead of replaceAll(), like so:

String text = "foobooandfoo";
Pattern p = Pattern.compile("([ao])|([df])|n");
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find())
{
  m.appendReplacement(sb, "");
  sb.append(m.start(1) != -1 ? '1' :
            m.start(2) != -1 ? '2' :
                               '3');
}
m.appendTail(sb);
System.out.println(sb.toString());

Of course, this is still overkill; for a job as simple as this one, I recommend erickson's approach.

Alan Moore