views:

5977

answers:

11

I've a performance related question regarding use of StringBuilder. In a very long loop I'm manipulating a StringBuilder and passing it to another method like this:

for (loop condition) {
    StringBuilder sb = new StringBuilder();
    sb.append("some string");
    . . .
    sb.append(anotherString);
    . . .
    passToMethod(sb.toString());
}

Is instantiating StringBuilder at every loop cycle is a good solution? And is calling a delete instead better, like the following?

StringBuilder sb = new StringBuilder();
for (loop condition) {
    sb.delete(0, sb.length);
    sb.append("some string");
    . . .
    sb.append(anotherString);
    . . .
    passToMethod(sb.toString());
}
+3  A: 

The modern JVM is really smart about stuff like this. I would not second guess it and do something hacky that is less maintainable/readable...unless you do proper bench marks with production data that validate a non-trivial performance improvement (and document it ;)

Stu Thompson
Where "non-trivial" is key - benchmarks can show one form being *proportionally* faster, but with no hint about how much time that's taking in the real app :)
Jon Skeet
See the benchmark in my answer below. The second way is faster.
Epaga
@Epaga: Your benchmark says little about the performance improvement in the real app, where the time taken to do the StringBuilder allocation may be trivial compared with the rest of the loop. That's why context is important in benchmarking.
Jon Skeet
@Jon I understand, but I*m assuming that if his whole question is geared towards which one has a higher performance, that a 25-50% difference IS important and that that part of his code will be called many times.
Epaga
@Epaga: Until he's measured it with his real code, we'll have no clue how significant it really is. If there's a lot of code for each iteration of the loop, I strongly suspect it'll still be irrelevant. We don't know what's in the "..."
Jon Skeet
(Don't get me wrong, btw - your benchmark results are still very interesting in themselves. I'm fascinated by microbenchmarks. I just don't like bending my code out of shape before performing real-life tests as well.)
Jon Skeet
wise words, i think we both fully agree. :-)
Epaga
+1  A: 

Alas I'm on C# normally, I think that clearing a StringBuilder weights more than instanciating a new one.

m0rb
+11  A: 

In the philosophy of writing solid code its always better to put your StringBuilder inside your loop. This way it doesnt go outside the code its intended for.

Secondly the biggest improvment in StringBuilder comes from giving it an initial size to avoid it growing bigger while the loop runs

for (loop condition) {
  StringBuilder sb = new StringBuilder(4096);
}
Peter
You could always scope the whole thing with curly brackets, that way you don't have the Stringbuilder outside.
Epaga
@Epaga: It's still outside the loop itself. Yes, it doesn't pollute the outer scope, but it's an unnatural way to write the code for a performance improvement which hasn't been verified *in context*.
Jon Skeet
Or even better, put the whole thing in its own method. ;-) But I hear ya re: context.
Epaga
Better yet initialize with the expected size instead of sum arbitrary number (4096)Your code may return a String that references a char[] of size 4096 (depends on the JDK; as far as I remember that was the case for 1.4 )
kohlerm
Yes indeed i should have mentioned that.
Peter
+2  A: 

Based on my experience with developing software on Windows I would say clearing the StringBuilder out during your loop has better performance than instantiating a StringBuilder with each iteration. Clearing it frees that memory to be overwritten immediately with no additional allocation required. I'm not familiar enough with the Java garbage collector, but I would think that freeing and no reallocation (unless your next string grows the StringBuilder) is more beneficial than instantiation.

(My opinion is contrary to what everyone else is suggesting. Hmm. Time to benchmark it.)

cfeduke
The thing is that more memory has to be reallocated anyway, as the existing data is being used by the newly created String at the end of the previous loop iteration.
Jon Skeet
Oh that makes sense, I had though that toString was allocating and returning a new string instance and the byte buffer for the builder was clearing instead of re-allocating.
cfeduke
Epaga's benchmark shows that clearing and re-using is a gain over instantiation at every pass.
cfeduke
+18  A: 

The second one is about 25% faster in my mini-benchmark.

public class ScratchPad {

    static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        for( int i = 0; i < 10000000; i++ ) {
            StringBuilder sb = new StringBuilder();
            sb.append( "someString" );
            sb.append( "someString2"+i );
            sb.append( "someStrin4g"+i );
            sb.append( "someStr5ing"+i );
            sb.append( "someSt7ring"+i );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for( int i = 0; i < 10000000; i++ ) {
            sb.delete( 0, sb.length() );
            sb.append( "someString" );
            sb.append( "someString2"+i );
            sb.append( "someStrin4g"+i );
            sb.append( "someStr5ing"+i );
            sb.append( "someSt7ring"+i );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
    }
}

Results:

25265
17969

Note that this is with JRE 1.6.0_07.


Based on Jon Skeet's ideas in the edit, here's version 2. Same results though. So stop upvoting the people saying #1 is faster. :)

public class ScratchPad {

    static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for( int i = 0; i < 10000000; i++ ) {
            sb.delete( 0, sb.length() );
            sb.append( "someString" );
            sb.append( "someString2" );
            sb.append( "someStrin4g" );
            sb.append( "someStr5ing" );
            sb.append( "someSt7ring" );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < 10000000; i++ ) {
            StringBuilder sb2 = new StringBuilder();
            sb2.append( "someString" );
            sb2.append( "someString2" );
            sb2.append( "someStrin4g" );
            sb2.append( "someStr5ing" );
            sb2.append( "someSt7ring" );
            a = sb2.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
    }
}

Results:

5016
7516
Epaga
I've added an edit in my answer to explain why this might be happening. I'll look more carefully in a while (45 mins). Note that doing concatenation in the append calls reduces the point of using StringBuilder in the first place somewhat :)
Jon Skeet
Also it would be interesting to see what happens if you reverse the two blocks - the JIT is still "warming up" StringBuilder during the first test. It may well be irrelevant, but interesting to try.
Jon Skeet
Tried both. It's still a lot faster.
Epaga
I'd still go with the first version because it's *cleaner*. But it's good that you've actually done the benchmark :)Next suggested change: try #1 with an appropriate capacity passed into the constructor.
Jon Skeet
Verified, running this benchmark with the tests reversed and several times back to back results in a substantial performance gain (3129ms with reallocation vs. 5903ms for instantiation) after removing concatenation.
cfeduke
Also with 1024 for the constructor and increasing the append operations to perform roughly 1024 characters (less than 1024 so no additional allocation is required) is 5264ms reallocation vs. 13985ms instantiation.
cfeduke
@cfeduke: Thanks. That's *very* interesting. Will look into that when I get to work. This has reached the point of being genuinely surprising. When I've worked out what's going on, I'll write a fuller answer :)
Jon Skeet
I'd want to see it run with the OP's production data. Does he/she have 200 appends? Are the strings really big? How does this impact the benchmark?
Stu Thompson
Wow. 5264 vs 13985 IS surprising.
Epaga
It is 10 million iterations and instantiations after all; at what point does allocation lose to instantiation?
cfeduke
Interesting. I would have expected better performance with StringBuilder outside the loop, but not 25% better.
Kip
Use sb.setLength(0); instead, it's the fastest way to empty the contents of StringBuilder against recreating object or using .delete(). Note that this doesn't apply to StringBuffer, its concurrency checks nullify the speed advantage.
P Arrayah
+4  A: 

Okay, I now understand what's going on, and it does make sense.

I was under the impression that toString just passed the underlying char[] into a String constructor which didn't take a copy. A copy would then be made on the next "write" operation (e.g. delete). I believe this was the case with StringBuffer in some previous version. (It isn't now.) But no - toString just passes the array (and index and length) to the public String constructor which takes a copy.

So in the "reuse the StringBuilder" case we genuinely create one copy of the data per string, using the same char array in the buffer the whole time. Obviously creating a new StringBuilder each time creates a new underlying buffer - and then that buffer is copied (somewhat pointlessly, in our particular case, but done for safety reasons) when creating a new string.

All this leads to the second version definitely being more efficient - but at the same time I'd still say it's uglier code.

(EDIT: I've now deleted my other answer as irrelevant and unhelpful.)

Jon Skeet
Just some funny info about the .NET, there situation is different. The .NET StringBuilder internally modifies regular "string" object and toString method simply returns it (marking it as non-modifiable, so consequent StringBuilder manipulations will re-create it). So, typical "new StringBuilder->modify it->to String" sequence will not make any extra copy (only for expanding the storage or shrinking it, if resulting string length is much shorter than its capacity). In Java this cycle always makes at least one copy (in StringBuilder.toString()).
Ivan Dubrov
A: 

Declare once, and assign each time. It is a more pragmatic and reusable concept than an optimization.

A: 

The first is better for humans. If the second is a bit faster on some versions of some JVMs, so what?

If performance is that critical, bypass StringBuilder and write your own. If you're a good programmer, and take into account how your app is using this function, you should be able to make it even faster. Worthwhile? Probably not.

Why is this question stared as "favorite question"? Because performance optimization is so much fun, no matter whether it is practical or not.

dongilmore
It isn't an academic question only. While most of the times (read 95%) I prefer readability and maintainability, there are really cases that little improvements makes big differences...
Pier Luigi
OK, I'll change my answer. If an object provides a method that allows it to be cleared and reused, then do so. Examine the code first if you want to make sure the clear is efficient; maybe it releases a private array! If efficient, then allocate the object outside the loop and reuse it inside.
dongilmore
+5  A: 

Since I don't think it's been pointed out yet, because of optimizations built into the Sun Java compiler, which automatically creates StringBuilders (StringBuffers pre-J2SE 5.0) when it sees String concatenations, the first example in the question is equivalent to:

for (loop condition) {
  String s = "some string";
  . . .
  s += anotherString;
  . . .
  passToMethod(s);
}

Which is more readable, IMO, the better approach. Your attempts to optimize may result in gains in some platform, but potentially losses others.

But if you really are running into issues with performance, then sure, optimize away. I'd start with explicitly specifying the buffer size of the StringBuilder though, per Jon Skeet.

Jack Leow
+2  A: 

Faster still:

public class ScratchPad {

    private static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();

        for( int i = 0; i < 10000000; i++ ) {
            // Resetting the string is faster than creating a new object.
            // Since this is a critical loop, every instruction counts.
            //
            sb.setLength( 0 );
            sb.append( "someString" );
            sb.append( "someString2" ).append( i );
            sb.append( "someStrin4g" ).append( i );
            sb.append( "someStr5ing" ).append( i );
            sb.append( "someSt7ring" ).append( i );
            setA( sb.toString() );
        }

        System.out.println( System.currentTimeMillis()-time );
    }

    private static void setA( String aString ) {
        a = aString;
    }
}

In the philosophy of writing solid code, the inner workings of the method should be hidden from the objects that use the method. Thus it makes no difference from the system's perspective whether you redeclare the StringBuilder within the loop or outside of the loop. Since declaring it outside of the loop is faster, and it does not make the code more complicated to read, then reuse the object rather than reinstantiate it.

Even if the code was more complicated, and you knew for certain that object instantiation was the bottleneck, comment it.

Note that "someString"+i is the true bottleneck as the + operator is expensive; use append(...) instead.

Dave Jarvis
A: 

The reason why doing a 'setLength' or 'delete' improves the performance is mostly the code 'learning' the right size of the buffer, and less to do the memory allocation. Generally, I recommend letting the compiler do the string optimizations. However, if the performance is critical, I'll often pre-calculate the expected size of the buffer. The default StringBuilder size is 16 characters. If you grow beyond that, then it has to resize. Resizing is where the performance is getting lost. Here's another mini-benchmark which illustrates this:

private void clear() throws Exception {
    long time = System.currentTimeMillis();
    int maxLength = 0;
    StringBuilder sb = new StringBuilder();

    for( int i = 0; i < 10000000; i++ ) {
        // Resetting the string is faster than creating a new object.
        // Since this is a critical loop, every instruction counts.
        //
        sb.setLength( 0 );
        sb.append( "someString" );
        sb.append( "someString2" ).append( i );
        sb.append( "someStrin4g" ).append( i );
        sb.append( "someStr5ing" ).append( i );
        sb.append( "someSt7ring" ).append( i );
        maxLength = Math.max(maxLength, sb.toString().length());
    }

    System.out.println(maxLength);
    System.out.println("Clear buffer: " + (System.currentTimeMillis()-time) );
}

private void preAllocate() throws Exception {
    long time = System.currentTimeMillis();
    int maxLength = 0;

    for( int i = 0; i < 10000000; i++ ) {
        StringBuilder sb = new StringBuilder(82);
        sb.append( "someString" );
        sb.append( "someString2" ).append( i );
        sb.append( "someStrin4g" ).append( i );
        sb.append( "someStr5ing" ).append( i );
        sb.append( "someSt7ring" ).append( i );
        maxLength = Math.max(maxLength, sb.toString().length());
    }

    System.out.println(maxLength);
    System.out.println("Pre allocate: " + (System.currentTimeMillis()-time) );
}

public void testBoth() throws Exception {
    for(int i = 0; i < 5; i++) {
     clear();
     preAllocate();
    }
}

The results show reusing the object is about 10% faster than creating a buffer of the expected size.

brianegge