ansaurus

Question

Java performance of StringBuilder in a loop

Answer 1

+3 A:

The modern JVM is really smart about stuff like this. I would not second guess it and do something hacky that is less maintainable/readable...unless you do proper bench marks with production data that validate a non-trivial performance improvement (and document it ;)

Stu Thompson 2008-10-28 07:12:29

Where "non-trivial" is key - benchmarks can show one form being *proportionally* faster, but with no hint about how much time that's taking in the real app :)

Jon Skeet 2008-10-28 07:28:47

See the benchmark in my answer below. The second way is faster.

Epaga 2008-10-28 07:34:45

@Epaga: Your benchmark says little about the performance improvement in the real app, where the time taken to do the StringBuilder allocation may be trivial compared with the rest of the loop. That's why context is important in benchmarking.

Jon Skeet 2008-10-28 07:40:39

@Jon I understand, but I*m assuming that if his whole question is geared towards which one has a higher performance, that a 25-50% difference IS important and that that part of his code will be called many times.

Epaga 2008-10-28 07:43:29

@Epaga: Until he's measured it with his real code, we'll have no clue how significant it really is. If there's a lot of code for each iteration of the loop, I strongly suspect it'll still be irrelevant. We don't know what's in the "..."

Jon Skeet 2008-10-28 07:52:58

(Don't get me wrong, btw - your benchmark results are still very interesting in themselves. I'm fascinated by microbenchmarks. I just don't like bending my code out of shape before performing real-life tests as well.)

Jon Skeet 2008-10-28 07:59:14

wise words, i think we both fully agree. :-)

Epaga 2008-10-28 08:05:00

Answer 2

+1 A:

Alas I'm on C# normally, I think that clearing a StringBuilder weights more than instanciating a new one.

m0rb 2008-10-28 07:14:15

Answer 3

+11 A:

In the philosophy of writing solid code its always better to put your StringBuilder inside your loop. This way it doesnt go outside the code its intended for.

Secondly the biggest improvment in StringBuilder comes from giving it an initial size to avoid it growing bigger while the loop runs

for (loop condition) {
  StringBuilder sb = new StringBuilder(4096);
}

Peter 2008-10-28 07:14:30

You could always scope the whole thing with curly brackets, that way you don't have the Stringbuilder outside.

Epaga 2008-10-28 07:37:22

@Epaga: It's still outside the loop itself. Yes, it doesn't pollute the outer scope, but it's an unnatural way to write the code for a performance improvement which hasn't been verified *in context*.

Jon Skeet 2008-10-28 07:42:03

Or even better, put the whole thing in its own method. ;-) But I hear ya re: context.

Epaga 2008-10-28 07:46:20

Better yet initialize with the expected size instead of sum arbitrary number (4096)Your code may return a String that references a char[] of size 4096 (depends on the JDK; as far as I remember that was the case for 1.4 )

kohlerm 2008-10-28 07:53:24

Yes indeed i should have mentioned that.

Peter 2008-10-28 09:21:10

Answer 4

+2 A:

Based on my experience with developing software on Windows I would say clearing the StringBuilder out during your loop has better performance than instantiating a StringBuilder with each iteration. Clearing it frees that memory to be overwritten immediately with no additional allocation required. I'm not familiar enough with the Java garbage collector, but I would think that freeing and no reallocation (unless your next string grows the StringBuilder) is more beneficial than instantiation.

(My opinion is contrary to what everyone else is suggesting. Hmm. Time to benchmark it.)

cfeduke 2008-10-28 07:17:24

The thing is that more memory has to be reallocated anyway, as the existing data is being used by the newly created String at the end of the previous loop iteration.

Jon Skeet 2008-10-28 07:19:49

Oh that makes sense, I had though that toString was allocating and returning a new string instance and the byte buffer for the builder was clearing instead of re-allocating.

cfeduke 2008-10-28 07:27:46

Epaga's benchmark shows that clearing and re-using is a gain over instantiation at every pass.

cfeduke 2008-10-28 07:43:48

Answer 5

+18 A:

The second one is about 25% faster in my mini-benchmark.

public class ScratchPad {

    static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        for( int i = 0; i < 10000000; i++ ) {
            StringBuilder sb = new StringBuilder();
            sb.append( "someString" );
            sb.append( "someString2"+i );
            sb.append( "someStrin4g"+i );
            sb.append( "someStr5ing"+i );
            sb.append( "someSt7ring"+i );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for( int i = 0; i < 10000000; i++ ) {
            sb.delete( 0, sb.length() );
            sb.append( "someString" );
            sb.append( "someString2"+i );
            sb.append( "someStrin4g"+i );
            sb.append( "someStr5ing"+i );
            sb.append( "someSt7ring"+i );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
    }
}

Results:

25265
17969

Note that this is with JRE 1.6.0_07.

Based on Jon Skeet's ideas in the edit, here's version 2. Same results though. So stop upvoting the people saying #1 is faster. :)

public class ScratchPad {

    static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for( int i = 0; i < 10000000; i++ ) {
            sb.delete( 0, sb.length() );
            sb.append( "someString" );
            sb.append( "someString2" );
            sb.append( "someStrin4g" );
            sb.append( "someStr5ing" );
            sb.append( "someSt7ring" );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < 10000000; i++ ) {
            StringBuilder sb2 = new StringBuilder();
            sb2.append( "someString" );
            sb2.append( "someString2" );
            sb2.append( "someStrin4g" );
            sb2.append( "someStr5ing" );
            sb2.append( "someSt7ring" );
            a = sb2.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
    }
}

Results:

5016
7516

Epaga 2008-10-28 07:17:45

I've added an edit in my answer to explain why this might be happening. I'll look more carefully in a while (45 mins). Note that doing concatenation in the append calls reduces the point of using StringBuilder in the first place somewhat :)

Jon Skeet 2008-10-28 07:26:20

Also it would be interesting to see what happens if you reverse the two blocks - the JIT is still "warming up" StringBuilder during the first test. It may well be irrelevant, but interesting to try.

Jon Skeet 2008-10-28 07:27:23

Tried both. It's still a lot faster.

Epaga 2008-10-28 07:33:35

I'd still go with the first version because it's *cleaner*. But it's good that you've actually done the benchmark :)Next suggested change: try #1 with an appropriate capacity passed into the constructor.

Jon Skeet 2008-10-28 07:35:58

Verified, running this benchmark with the tests reversed and several times back to back results in a substantial performance gain (3129ms with reallocation vs. 5903ms for instantiation) after removing concatenation.

cfeduke 2008-10-28 07:38:22

Also with 1024 for the constructor and increasing the append operations to perform roughly 1024 characters (less than 1024 so no additional allocation is required) is 5264ms reallocation vs. 13985ms instantiation.

cfeduke 2008-10-28 07:45:34

@cfeduke: Thanks. That's *very* interesting. Will look into that when I get to work. This has reached the point of being genuinely surprising. When I've worked out what's going on, I'll write a fuller answer :)

Jon Skeet 2008-10-28 07:50:43

I'd want to see it run with the OP's production data. Does he/she have 200 appends? Are the strings really big? How does this impact the benchmark?

Stu Thompson 2008-10-28 07:53:54

Wow. 5264 vs 13985 IS surprising.

Epaga 2008-10-28 07:55:19

It is 10 million iterations and instantiations after all; at what point does allocation lose to instantiation?

cfeduke 2008-10-28 15:09:28

Interesting. I would have expected better performance with StringBuilder outside the loop, but not 25% better.

Kip 2008-10-30 14:34:53

Use sb.setLength(0); instead, it's the fastest way to empty the contents of StringBuilder against recreating object or using .delete(). Note that this doesn't apply to StringBuffer, its concurrency checks nullify the speed advantage.

P Arrayah 2008-11-11 21:00:48

Answer 6

+4 A:

Okay, I now understand what's going on, and it does make sense.

I was under the impression that toString just passed the underlying char[] into a String constructor which didn't take a copy. A copy would then be made on the next "write" operation (e.g. delete). I believe this was the case with StringBuffer in some previous version. (It isn't now.) But no - toString just passes the array (and index and length) to the public String constructor which takes a copy.

So in the "reuse the StringBuilder" case we genuinely create one copy of the data per string, using the same char array in the buffer the whole time. Obviously creating a new StringBuilder each time creates a new underlying buffer - and then that buffer is copied (somewhat pointlessly, in our particular case, but done for safety reasons) when creating a new string.

All this leads to the second version definitely being more efficient - but at the same time I'd still say it's uglier code.

(EDIT: I've now deleted my other answer as irrelevant and unhelpful.)

Jon Skeet 2008-10-28 08:18:59

Just some funny info about the .NET, there situation is different. The .NET StringBuilder internally modifies regular "string" object and toString method simply returns it (marking it as non-modifiable, so consequent StringBuilder manipulations will re-create it). So, typical "new StringBuilder->modify it->to String" sequence will not make any extra copy (only for expanding the storage or shrinking it, if resulting string length is much shorter than its capacity). In Java this cycle always makes at least one copy (in StringBuilder.toString()).

Ivan Dubrov 2009-07-16 01:54:50

Answer 7

A:

Declare once, and assign each time. It is a more pragmatic and reusable concept than an optimization.

2008-10-28 13:52:05

Answer 8

A:

The first is better for humans. If the second is a bit faster on some versions of some JVMs, so what?

If performance is that critical, bypass StringBuilder and write your own. If you're a good programmer, and take into account how your app is using this function, you should be able to make it even faster. Worthwhile? Probably not.

Why is this question stared as "favorite question"? Because performance optimization is so much fun, no matter whether it is practical or not.

dongilmore 2008-10-30 00:32:23

It isn't an academic question only. While most of the times (read 95%) I prefer readability and maintainability, there are really cases that little improvements makes big differences...

Pier Luigi 2008-10-30 13:08:43

OK, I'll change my answer. If an object provides a method that allows it to be cleared and reused, then do so. Examine the code first if you want to make sure the clear is efficient; maybe it releases a private array! If efficient, then allocate the object outside the loop and reuse it inside.

dongilmore 2008-11-02 17:23:50

Answer 9

+5 A:

Since I don't think it's been pointed out yet, because of optimizations built into the Sun Java compiler, which automatically creates StringBuilders (StringBuffers pre-J2SE 5.0) when it sees String concatenations, the first example in the question is equivalent to:

for (loop condition) {
  String s = "some string";
  . . .
  s += anotherString;
  . . .
  passToMethod(s);
}

Which is more readable, IMO, the better approach. Your attempts to optimize may result in gains in some platform, but potentially losses others.

But if you really are running into issues with performance, then sure, optimize away. I'd start with explicitly specifying the buffer size of the StringBuilder though, per Jon Skeet.

Jack Leow 2008-10-30 13:07:51

Answer 10

+2 A:

Faster still:

public class ScratchPad {

    private static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();

        for( int i = 0; i < 10000000; i++ ) {
            // Resetting the string is faster than creating a new object.
            // Since this is a critical loop, every instruction counts.
            //
            sb.setLength( 0 );
            sb.append( "someString" );
            sb.append( "someString2" ).append( i );
            sb.append( "someStrin4g" ).append( i );
            sb.append( "someStr5ing" ).append( i );
            sb.append( "someSt7ring" ).append( i );
            setA( sb.toString() );
        }

        System.out.println( System.currentTimeMillis()-time );
    }

    private static void setA( String aString ) {
        a = aString;
    }
}

In the philosophy of writing solid code, the inner workings of the method should be hidden from the objects that use the method. Thus it makes no difference from the system's perspective whether you redeclare the StringBuilder within the loop or outside of the loop. Since declaring it outside of the loop is faster, and it does not make the code more complicated to read, then reuse the object rather than reinstantiate it.

Even if the code was more complicated, and you knew for certain that object instantiation was the bottleneck, comment it.

Note that "someString"+i is the true bottleneck as the + operator is expensive; use append(...) instead.

Dave Jarvis 2009-06-22 06:20:33

Answer 11

A:

The reason why doing a 'setLength' or 'delete' improves the performance is mostly the code 'learning' the right size of the buffer, and less to do the memory allocation. Generally, I recommend letting the compiler do the string optimizations. However, if the performance is critical, I'll often pre-calculate the expected size of the buffer. The default StringBuilder size is 16 characters. If you grow beyond that, then it has to resize. Resizing is where the performance is getting lost. Here's another mini-benchmark which illustrates this:

private void clear() throws Exception {
    long time = System.currentTimeMillis();
    int maxLength = 0;
    StringBuilder sb = new StringBuilder();

    for( int i = 0; i < 10000000; i++ ) {
        // Resetting the string is faster than creating a new object.
        // Since this is a critical loop, every instruction counts.
        //
        sb.setLength( 0 );
        sb.append( "someString" );
        sb.append( "someString2" ).append( i );
        sb.append( "someStrin4g" ).append( i );
        sb.append( "someStr5ing" ).append( i );
        sb.append( "someSt7ring" ).append( i );
        maxLength = Math.max(maxLength, sb.toString().length());
    }

    System.out.println(maxLength);
    System.out.println("Clear buffer: " + (System.currentTimeMillis()-time) );
}

private void preAllocate() throws Exception {
    long time = System.currentTimeMillis();
    int maxLength = 0;

    for( int i = 0; i < 10000000; i++ ) {
        StringBuilder sb = new StringBuilder(82);
        sb.append( "someString" );
        sb.append( "someString2" ).append( i );
        sb.append( "someStrin4g" ).append( i );
        sb.append( "someStr5ing" ).append( i );
        sb.append( "someSt7ring" ).append( i );
        maxLength = Math.max(maxLength, sb.toString().length());
    }

    System.out.println(maxLength);
    System.out.println("Pre allocate: " + (System.currentTimeMillis()-time) );
}

public void testBoth() throws Exception {
    for(int i = 0; i < 5; i++) {
     clear();
     preAllocate();
    }
}

The results show reusing the object is about 10% faster than creating a buffer of the expected size.

brianegge 2009-07-16 01:22:08

ansaurus

tags:

views:

answers:

Java performance of StringBuilder in a loop

related questions