views:

369

answers:

8

How to Improve performance of this chunk of code :

public static String concatStrings(Vector strings) {
    String returnValue = "";

    Iterator iter = strings.iterator();
    while( iter.hasNext() ) {
        returnValue += (String)iter.next();
    }

    return returnValue;
}
+13  A: 

You might look at using StringBuilder, rather than doing += with the individual strings. Strings are immutable in Java, which means that once you create a String object, you cannot modify it. Using += on strings in a loop will result in the creation of many separate String instances, which may create performance problems. StringBuilder can concatenate Strings without having to create new instances, which may save some time, depending on the exact scenario.

Andy White
Can you elaborate on this ?
Rachel
The `+=` implicitly creates a **new** String everytime.
BalusC
Can you put your explanation in code ?
Rachel
Create `StringBuilder` before loop, call `append()` method inside loop, use `toString()` after loop to get result.
BalusC
I can't put it in code, but it's pretty easy. Just create a new StringBuilder, and use the "append" method in your loop to add new strings to the buffer. Once you're done, just call stringBuilder.toString() to get the final result.
Andy White
+1  A: 

http://stackoverflow.com/questions/47605/java-string-concatenation

Ray Tayek
Rather post this as a comment.
BalusC
+1  A: 

Also if you are looking to make this faster, you could refactor the code to use ArrayList instead of Vector. ArrayList is not thread safe, so it is slightly faster than Vector (depends on the situation, may be 0% difference, may be 5% difference).

bwawok
+1  A: 

You are creating a string every time you call +=. For example

String theString = "1"; //Makes an immutable String object "1"
theString +="2"; //Makes a new immutable String object "12"
theString +="3"; //makes a new immutable String object "123"

Using a string builder avoids this problem.

StringBuilder sb = new StringBuilder("1"); //Makes a StringBuilder object holding 1
sb.append("2"); //The same StringBuilder object now has "12" in it.
sb.append("3"); //The same StringBuidler object now has "123" in it. 
String theString = sb.toString(); //Creates a new String object with "123" in it 

Note how in the first example we made all of those intermediate strings where in the second example we only created the StringBuilder and the final String (in both examples we created "1" "2" and "3" when we used them as arguments). You can see that there are fewer objects created in the first example, and if you're doing a lot of appending to the String, you can imagine how that adds up!

JF
+6  A: 

As suggested by other answers, using a StringBuilder would probably be a better option.

The code given in the question will actually be compiled (with Sun's javac) to something along the line of the following:

public static String concatStrings(Vector strings) {
    String returnValue = "";

    Iterator iter = strings.iterator();
    while( iter.hasNext() ) {
        String str = (String)iter.next();

        StringBuilder sb = new StringBuilder(returnValue);
        sb.append(str);

        returnValue = sb.toString();
    }

    return returnValue;
}

The compiler will change the += string concatenation with one that uses a StringBuilder. However, the compiler will probably rewrite the code inside the loop so a new StringBuilder instance will be created on each iteration, which is not very performance-friendly.

Therefore, in this case, it would probably be a better idea to create a StringBuilder outside the loop on our own, and perform manual string concatenation:

public static String concatStrings(Vector strings) {
    StringBuidler returnValueBuilder;

    Iterator iter = strings.iterator();
    while( iter.hasNext() ) {
        returnValueBuilder.append((String)iter.next());
    }

    return returnValueBuilder.toString();
}
coobird
+1 for the fact that the compiler will optimise concatination in certain circumstances i.e. '+' isn't always evil. However in cases like the one presented in the question the overhead is not in creating new StringBuilder instances (object allocation is usually very fast) but in copying the StringBuilder buffer into the String before returning the new String (String guarauntees immutability so they can't reuse the same buffer)
CurtainDog
@CurtainDog - IIRC, `StringBuilder.toString()` cleverly avoids copying the character array when creating the result `String`. Take a look at the code.
Stephen C
David
@David - I learned something new today! Apparently the change happened between 1.4.x and 1.5 ... http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6219959
Stephen C
@Stephen C - thanks, you've made me feel quite young... all my serious java coding must've come post 1.5. Of course, I see that my initial comment is inaccurate, for 'can't reuse' read 'find it problematic to reuse'
CurtainDog
+1  A: 

In addition to using a StringBuilder, you can walk the list of Strings beforehand and calculate the exact size of needed for the StringBuilder. Then pass this value into the StringBuilder constructor. Note that this would fall into the category of premature optimisation but you did ask for performance... (You should look at the code for growing the StringBuilder/StringBuffer buffers, its educational)

CurtainDog
+8  A: 
public static String concatStrings(List<String> strings) {
    StringBuilder sb = new StringBuilder();
    for (String s : strings) {
       sb.append(s);
    }    
    return sb.toString();
}

Some remarks:

  • Use StringBuilder whenever you need to build a string in a loop
    • + is fine for simple concatenation, but horrible for incremental build
  • Whenever possible, use for-each for readability
  • java.util.Vector is synchronized; if you don't need this (costly) feature, just use an ArrayList.

Don't use raw types

  • JLS 4.8 Raw Types

    The use of raw types is allowed only as a concession to compatibility of legacy code. The use of raw types in code written after the introduction of genericity into the Java programming language is strongly discouraged. It is possible that future versions of the Java programming language will disallow the use of raw types.

  • Effective Java 2nd Edition: Item 23: Don't use raw types in new code

    If you use raw types, you lose all the safety and expressiveness benefits of generics.

See also

polygenelubricants
See also http://stackoverflow.com/questions/2770321/what-is-a-raw-type-and-why-shouldnt-we-use-it/
polygenelubricants
+4  A: 
private static final int AVERAGE_STRING_LENGTH = 10;  // Using 10 is arbitrary

public static final String concatStrings(final Collection<String> strings) {
    if (strings == null)   return null;

    final int size = strings.size();
    if (size == 0)         return "";
    if (size == 1)         return strings.get(0);

    final StringBuilder returnValue =
        new StringBuilder(AVERAGE_STRING_LENGTH * size);

    for (String s : strings) {
        returnValue.append(s);
    }

    return returnValue.toString();
}

Perhaps going a little overboard, here's every optimization I could think of for concatStrings() - demonstrated above - some of which may not be applicable to your environment:

  • Use StringBuilder - its far more efficient for these successive concatenations
  • Use StringBuilder(int capacity) to specify the likely necessary capacity, if there's any way to anticipate it (used average size above, but other methods may be more convenient)
  • Use the Collection parameter type to allow for a more efficient data structure than Vector, which is synchronized - plus the caller has far greater flexibility (e.g. no need to copy a Set<String> to a Vector<String> just to call this method)
  • Hardcode simple cases, if they are likely (e.g. null, size 0, and size 1 cases above)
  • Use final to facilitate JIT inlining and optimization
  • Cache the size of strings, if its used multiple times. (e.g. used 3 times in the above code.)

Finally, if this operation is done very often over a large number of strings, look into Ropes for Java.

Bert F