ansaurus

Question

Is using StringBuilder Remove method more memory efficient than creating a new StringBuilder in loop?

Answer 1

A:

I'd say option #2 if definitely more straightforward. In terms of performance, sounds like something you'd just need to test and see. I'd guess that it doesn't make enough difference to choose the less straightforward option.

bdukes 2008-11-05 22:13:46

Answer 2

+4 A:

Option 2 should (I believe) actually outperform option 1. The act of calling Remove "forces" the StringBuilder to take a copy of the string it's already returned. The string is actually mutable within StringBuilder, and StringBuilder doesn't take a copy unless it needs to. With option 2 it copies before basically clearing the array out - with option 1 no copy is required.

The only downside of option 2 is that if the string ends up being long, there will be multiple copies made while appending - whereas option 1 keeps the original size of buffer. If this is going to be the case, however, specify an initial capacity to avoid the extra copying. (In your sample code, the string will end up being bigger than the default 16 characters - initializing it with a capacity of, say, 32 will reduce the extra strings required.)

Aside from the performance, however, option 2 is just cleaner.

Jon Skeet 2008-11-05 22:16:09

Wouldn't string.Concat("prefix ", outputString, " postfix") be faster than both options?

configurator 2009-03-05 21:21:06

Yes, in this case - but I treated the question as taking the use of a StringBuilder as a given (e.g. in the real code there's an appending loop before all of this.)

Jon Skeet 2009-03-05 23:17:41

Why "take a copy" at all? All we are trying to do is clear the string.

Vulcan Eager 2009-08-25 09:36:41

Answer 3

+1 A:

Hate to say it, but how about just testing it?

Joel Lucsy 2008-11-05 22:19:28

Any suggestions on how to profile memory would be helpful. Do you know of any memory profilers for .NET other than the commercial ANTS product?

Sixto Saez 2008-11-05 22:27:00

JetBrains dotTrace

bdukes 2008-11-05 22:30:08

Answer 4

+1 A:

This stuff is easy to find out by yourself. Run Perfmon.exe and add a counter for .NET Memory + Gen 0 Collections. Run the test code a million times. You'll see that option #1 requires half of the number of collections option #2 needs.

Hans Passant 2008-11-05 22:27:52

Did you try it when you also specify an appropriate capacity in the StringBuilder constructor? I'd then expect option 2 to beat option 1... not in terms of *collections* but because it doesn't do as much data copying.

Jon Skeet 2008-11-05 22:34:14

Same answer isn't it? Did you try it?

Hans Passant 2008-11-06 00:52:07

Answer 5

A:

I think option 1 would be slightly more memory efficient as a new object isn't created everytime. Having said that, the GC does a pretty good job of cleaning up resources like in option 2.

I think you may be falling into the trap of premature optimization (the root of all evil --Knuth). Your IO is going to take much more resources than the string builder.

I tend go with the clearer/cleaner option, in this case option 2.

Rob

Robert Wagner 2008-11-05 22:31:32

Answer 6

+1 A:

We've talked about this before with Java, here's the [Release] results of the C# version:

Option #1 (10000000 iterations): 11264ms
Option #2 (10000000 iterations): 12779ms

Update: In my non-scientific analysis allowing the two methods to execute while monitoring all the memory performance counters in perfmon did not result in any sort of discernible difference with either method (other than having some counters spike only while either test was executing).

And here's what I used to test:

class Program
{
    const int __iterations = 10000000;

    static void Main(string[] args)
    {
     TestStringBuilder();
     Console.ReadLine();
    }

    public static void TestStringBuilder()
    {
     //potentially a collection with several hundred items:
     var outputStrings = new [] { "test1", "test2", "test3" };

     var stopWatch = new Stopwatch();

     //Option #1
     stopWatch.Start();
     var formattedOutput = new StringBuilder();

     for (var i = 0; i < __iterations; i++)
     {
      foreach (var outputString in outputStrings)
      {
       formattedOutput.Append("prefix ");
       formattedOutput.Append(outputString);
       formattedOutput.Append(" postfix");

       var output = formattedOutput.ToString();
       ExistingOutputMethodThatOnlyTakesAString(output);

       //Clear existing string to make ready for next iteration:
       formattedOutput.Remove(0, output.Length);
      }
     }
     stopWatch.Stop();

     Console.WriteLine("Option #1 ({1} iterations): {0}ms", stopWatch.ElapsedMilliseconds, __iterations);
            Console.ReadLine();
     stopWatch.Reset();

     //Option #2
     stopWatch.Start();
     for (var i = 0; i < __iterations; i++)
     {
      foreach (var outputString in outputStrings)
      {
       StringBuilder formattedOutputInsideALoop = new StringBuilder();

       formattedOutputInsideALoop.Append("prefix ");
       formattedOutputInsideALoop.Append(outputString);
       formattedOutputInsideALoop.Append(" postfix");

       ExistingOutputMethodThatOnlyTakesAString(
          formattedOutputInsideALoop.ToString());
      }
     }
     stopWatch.Stop();

     Console.WriteLine("Option #2 ({1} iterations): {0}ms", stopWatch.ElapsedMilliseconds, __iterations);
    }

    private static void ExistingOutputMethodThatOnlyTakesAString(string s)
    {
     // do nothing
    }
}

Option 1 in this scenario is marginally faster though option 2 is easier to read and maintain. Unless you happen to be performing this operation millions of times back to back I would stick with Option 2 because I suspect that option 1 and 2 are about the same when running in a single iteration.

cfeduke 2008-11-05 22:34:23

Oh yes memory efficient, helps to read the question fully. :) Run the above and monitor perfmon.

cfeduke 2008-11-05 22:41:06

Try it again while specifying the capacity for the StringBuilder as 32 in both cases - then option 2 is faster (on my box anyway).

Jon Skeet 2008-11-05 23:03:31

Yes option 2 is substantially faster (10403ms) when the initial size is set to 32 - in fact option 1 is slower (15502ms) when 32 is specified than when nothing is specified (11264ms).

cfeduke 2008-11-06 02:16:16

Answer 7

+1 A:

Since you are concerned only with memory I would suggest:

foreach (string outputString in outputStrings)
    {    
        string output = "prefix " + outputString + " postfix";
        ExistingOutputMethodThatOnlyTakesAString(output)  
    }

The variable named output is the same size in your original implementation, but no other objects are needed. StringBuilder uses strings and other objects internally and you will be created many objects that need to be GC'd.

Both the line from option 1:

string output = formattedOutput.ToString();

And the line from option 2:

ExistingOutputMethodThatOnlyTakesAString(
           formattedOutputInsideALoop.ToString());

will create an immutable object with the value of the prefix + outputString + postfix. This string is the same size no matter how you create it. What you are really asking is which is more memory efficient:

    StringBuilder formattedOutput = new StringBuilder(); 
    // create new string builder

or

    formattedOutput.Remove(0, output.Length); 
    // reuse existing string builder

Skipping the StringBuilder entirely will be more memory efficient than either of the above.

If you really need to know which of the two is more efficient in your application (this will probably vary based on the size of your list, prefix, and outputStrings) I would recommend red-gate ANTS Profiler http://www.red-gate.com/products/ants_profiler/index.htm

Jason

Jason Hernandez 2008-11-05 23:41:56

Answer 8

+2 A:

While your profiling, you could also try just setting the length of the StringBuilder to zero when you enter the loop.

formattedOutput.Length = 0;

Ty 2008-11-05 23:49:55

Great suggestion. Didn't know I could do that.

Sixto Saez 2008-11-06 02:19:04

Answer 9

A:

Measure it
Pre-allocate as close as possible to how much memory you think you'll need
If speed is your preference, then consider a fairly straight forward multi-threaded front to middle, middle to end concurrent approach (expand division of labour as required)
measure it again

what's more important to you?

memory
speed
clarity

adam straughan 2008-11-06 15:55:16

Answer 10

+1 A:

Several of the answers gently suggested that I get off my duff and figure out it myself so below are my results. I think that sentiment generally goes against the grain of this site but if you want something done right, you might as well do.... :)

I modified option #1 to take advantage of @Ty suggestion to use StringBuilder.Length = 0 instead of the Remove method. This made the code of the two options more similar. The two differences are now whether the constructor for the StringBuilder is in or out of the loop and option #1 now uses the the Length method to clear the StringBuilder. Both options were set to run over an outputStrings array with 100,000 elements to make the garbage collector do some work.

A couple answers offered hints to look at the various PerfMon counters & such and use the results to pick an option. I did some research and ended up using the built-in Performance Explorer of the Visual Studio Team Systems Developer edition that I have at work. I found the second blog entry of a multipart series that explained how to set it up here. Basically, you wire up a unit test to point at the code you want to profile; go through a wizard & some configurations; and launch the unit test profiling. I enabled the .NET object allocation & lifetime metrics. The results of the profiling where difficult to format for this answer so I placed them at the end. If you copy and paste the text into Excel and massage them a bit, they'll be readable.

Option #1 is the most memory efficiency because it makes the garbage collector do a little less work and it allocates half the memory and instances to the StringBuilder object than Option #2. For everyday coding, picking option #2 is perfectly fine.

If you're still reading, I asked this question because Option #2 will make the memory leak detectors of an experience C/C++ developer go ballistic. A huge memory leak will occur if the StringBuilder instance is not released before being reassigned. Of course, we C# developers don't worry about such things (until they jump up and bite us). Thanks to all!!

ClassName   Instances TotalBytesAllocated Gen0_InstancesCollected Gen0BytesCollected Gen1InstancesCollected Gen1BytesCollected
=======Option #1        
System.Text.StringBuilder   100,001 2,000,020 100,016 2,000,320 2 40
System.String   301,020 32,587,168 201,147 11,165,268 3 246
System.Char[]   200,000 8,977,780 200,022 8,979,678 2 90
System.String[] 1 400,016 26 1,512 0 0
System.Int32    100,000 1,200,000 100,061 1,200,732 2 24
System.Object[] 100,000 2,000,000 100,070 2,004,092 2 40
======Option #2     
System.Text.StringBuilder   200,000 4,000,000 200,011 4,000,220 4 80
System.String   401,018 37,587,036 301,127 16,164,318 3 214
System.Char[]   200,000 9,377,780 200,024 9,379,768 0 0
System.String[] 1 400,016 20 1,208 0 0
System.Int32    100,000 1,200,000 100,051 1,200,612 1 12
System.Object[] 100,000 2,000,000 100,058 2,003,004 1 20

Sixto Saez 2008-11-07 21:48:21

ansaurus

tags:

views:

answers:

Is using StringBuilder Remove method more memory efficient than creating a new StringBuilder in loop?

related questions