tags:

views:

1061

answers:

5

How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

+3  A: 

Not really - it uses internal character buffer. Only when buffer capacity gets exhausted, it will allocate new buffer. Append operation will simply add to this buffer, string object will be created when ToString() method is called on it - henceforth, its advisable for many string concatenations as each traditional string concat op would create new string. You can also specify initial capacity to string builder if you have rough idea about it to avoid multiple allocations.

Edit: People are pointing out that my understanding is wrong. Please ignore the answer (I rather not delete it - it will stand as a proof of my ignorance :-)

VinayC
It acts *as though* it were a character buffer, but it really is a mutated `string` instance. Honest.
Marc Gravell
Thanks Marc - I was under impression that it uses character buffer. It means that it would have some native implementation to mutate string object.
VinayC
sure, but it is a core framework class. It has access to the native implementation.
Marc Gravell
@VinayC - apols, it looks like (previous comments on this page) this has changed in .NET 4.
Marc Gravell
Never mind - I was under the same impression for even previous versions - so sure that I didn't even bother to check it via reflector.
VinayC
+33  A: 

In .NET 2.0 it uses the String class internally. String is only immutable outside of the System namespace, so StringBuilder can do that.

In .NET 4.0 it was changed to char[].

In 2.0 StringBuilder looked like this

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal IntPtr m_currentThread;
    internal int m_MaxCapacity;
    internal volatile string m_StringValue; // HERE ----------------------
    private const string MaxCapacityField = "m_MaxCapacity";
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

But in 4.0 it looks like this:

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal char[] m_ChunkChars; // HERE --------------------------------
    internal int m_ChunkLength;
    internal int m_ChunkOffset;
    internal StringBuilder m_ChunkPrevious;
    internal int m_MaxCapacity;
    private const string MaxCapacityField = "m_MaxCapacity";
    internal const int MaxChunkSize = 0x1f40;
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

So evidently it was changed from using a string to using a char[].

EDIT: Updated answer to reflect changes in .NET 4 (that I only just discovered).

Brian Rasmussen
Had no idea.. Think Im gonna do some reflector magic to satisfy my curiosity :)
cwap
@Brian: as far as I know it holds a `Char` array internally, not a `String` (at least in .NET 4, perhaps this has changed?)
Fredrik Mörk
@Fredrik - in the MS implementation, it really is a `string` that gets mutated
Marc Gravell
@Marc: this got me curious so I checked with Reflector; looks like this has changed. It was a `string` before, now it seems to be a `char` array being manipulated instead.
Fredrik Mörk
@Fredrik - then I take it back!
Marc Gravell
@Fredrik: I was just going through the code in Reflector while you commented. I have updated the answer.
Brian Rasmussen
http://www.nesterovsky-bros.com/weblog/2010/08/25/StringAndStringBuilderInNET4.aspx
0A0D
@0A0D Thanks for the link.
Brian Rasmussen
@Brian: NP. It was posted today so they could have easily copied your answer :)
0A0D
+2  A: 

If I look at .NET Reflector at .NET 2 then I will find this:

public StringBuilder Append(string value)
{
    if (value != null)
    {
        string stringValue = this.m_StringValue;
        IntPtr currentThread = Thread.InternalGetCurrentThread();
        if (this.m_currentThread != currentThread)
        {
            stringValue = string.GetStringForStringBuilder(stringValue, stringValue.Capacity);
        }
        int length = stringValue.Length;
        int requiredLength = length + value.Length;
        if (this.NeedsAllocation(stringValue, requiredLength))
        {
            string newString = this.GetNewString(stringValue, requiredLength);
            newString.AppendInPlace(value, length);
            this.ReplaceString(currentThread, newString);
        }
        else
        {
            stringValue.AppendInPlace(value, length);
            this.ReplaceString(currentThread, stringValue);
        }
    }
    return this;
}

So it is a mutated string instance...

EDIT Except in .NET 4 it is a char[]

Yves M.
@Richard: thanks for the EDIT. Didn't know that fact.
Yves M.
+1  A: 

If you want to see one of the possible implementations (That is similar to the one shipped wit the microsoft implementation up to v3.5) you could see the source of the Mono one on github.

VirtualBlackFox
A: 

Just a guess. It is chunked to avoid the LOH for large string.

lidali