views:

186

answers:

4

I'm wondering if this code ...

StringBuilder sb = new StringBuilder("Please read the following messages.");

... initializes sb with a buffer exactly as large as the string passed to the constructor. On the one hand, this would seem the most logical thing. On the other hand, it seems to kind of defeat the purpose of the StringBuilder class for one of its most common uses, which is to provide mutability to make repeated appends more efficient. (The very first call to Append, if the answer to my question is "yes", would require sb to resize itself.)

Then again, I suppose one could view this as analogous to the constructor for List<T> that takes an IEnumerable<T> as a parameter. Maybe the assumption in this case is that you're not planning on appending a lot, but rather on manipulating what's already there.

The only real research I've done on this was to check the MSDN documentation on StringBuilder, which didn't provide an answer (it says the constructor initializes the instance "using the specified string," but doesn't indicate how the string is used).


EDIT: So it's "implementation-specific"... does this not seem weird to anyone else? I mean, the purpose of the StringBuilder class is to offer an alternative to performing a lot of operations on a string, creating a ton of immutable string instances along the way; therefore, it's for efficiency. I feel like the behavior of this constructor ought the be specified, so that the developer can make an informed decision how to use it regardless of platform.

I mean, it is implemented by Microsoft a certain way; they could easily have put that in the documentation (forcing other implementations to follow suit). Just a personal source of puzzlement...

+4  A: 

Check the StringBuilder's Capacity member.

From MSDN:

The StringBuilder dynamically allocates more space when required and increases Capacity accordingly. For performance reasons, a StringBuilder might allocate more memory than needed. The amount of memory allocated is implementation-specific.

devio
+5  A: 

It's an implementation detail that you shouldn't need to worry about. However, using .NET reflector, and looking in the (string,int32,int32,int32) overload of the constructor (which the other constructors call), we can see that it picks a capacity that is a multiple of 16 (next largest over the requested size)

Edit

Actually, it's 16 x 2^n, with the value of "n" selected to be the next largest size

Damien_The_Unbeliever
Good call -- I just checked this myself and reached the same conclusion (by checking the `Capacity` property after initialization... somehow this did not occur to me until after I'd posted the question).
Dan Tao
... and yet you shouldn't rely on this. If you care enough about the required capacity to post a question on Stack Overflow, you should just use one of the overloads where you can specify it. The whole purpose of "implementation specific" means it can change in the future. Don't rely on undocumented behavior.
Lasse V. Karlsen
@Lasse - Oh, agreed. I wouldn't advocate *relying* on this detail. (And struggling to think how you could end up relying on this fact)
Damien_The_Unbeliever
@Lasse: Now that I know the answer to the question, I won't. What I'm questioning is really why this constructor exists at all if its implementation is not specified.
Dan Tao
It's just some parts of its behavior that isn't documented, there's plenty of that in the .NET runtime.
Lasse V. Karlsen
@Lasse: That's true, and is completely desirable in many cases (what matters is what a method *does*, not how it works internally); but see my edit to the question. My point is that the whole purpose of `StringBuilder` (as I see it) is to improve the efficiency of some string operations in certain scenarios. If the most efficient way to utilize one of its constructors is going to depend on implementation, I question why that constructor is even offered.
Dan Tao
I agree, I especially agree with the sentiment that this should be documented. Any class that purports to increase performance by swapping out one code pattern with another should at the very least document every side-effect. Sometimes you need to know those things.
Lasse V. Karlsen
+2  A: 

The constructor you linked to is probably chained to the StringBuilder(String, Int32, Int32, Int32):

public StringBuilder(
  string value,
  int startIndex,
  int length,
  int capacity
)

So, for the string, it would probably pass through: string, 0, string.Length, string.Length. Or something similar that makes sense in the StringBuilder context.

Oded
Sounds like (from Damien's answer, for which he used Reflector) your first "probably" was right, and your second was close (actually the next power of 16, as opposed to just `string.Length`).
Dan Tao
...and by "the next power of 16", I mean "16 times the next power of 2". I don't think after 16 it goes straight to 256 and then to 4096 (I'm stupid).
Dan Tao
+1  A: 

The constructor that eventually gets called is:

// "Please read the following messages.".Length = 35
public StringBuilder(string value, int startIndex, int length, int capacity)
public StringBuilder("Please read the following messages.", 0, 
        "Please read the following messages.".Length, 16)

(This is nothing that the other answers don't provide, and is just from reflector)
If the capacity is less than the length of the string, which it is in this case:

while (capacity < length)
{
    capacity *= 2;
    if (capacity < 0)
    {
        capacity = length;
        break;
    }
}

In Mono, the StringBuilder(string val) constructor allocates the capacity to int.MaxValue until an append occurs.

The real answer lies in the method that ends up being called internally in the CLR, where length is the capacity:

[MethodImpl(MethodImplOptions.InternalCall)]
private static extern string FastAllocateString(int length);

I can't find the source for this in the SSCLI however the Mono version (\mono\metadata\object.c) does it like this:

mono_string_new_size (MonoDomain *domain, gint32 len)
{
    MonoString *s;
    MonoVTable *vtable;
    size_t size = (sizeof (MonoString) + ((len + 1) * 2));

...
}

Which is the size in bytes of a MonoString object, plus the length times 2.

Chris S
Where is this information coming from? Are you using Windows, Mono, or...? It seems your findings regarding `StringBuilder`'s initial capacity differ from Damien's (and mine).
Dan Tao
@Dan - a small mistake with 0x10 :)
Chris S
@Chris: The code you've provided is helpful, but conflicts with what you've said! "The constructor uses the length of your string as the capacity" -- this isn't true; that `while` loop doubles `capacity` until `capacity >= length`; therefore it's going to be its starting value (which appears to be 16) times a power of 2.
Dan Tao
It was wrong, I've corrected it but tempted to just hit delete
Chris S
Please don't -- the answer as it is now is extremely informative. It becomes completely clear how this particular implementation works (calls the `string, int, int, int` constructor, which doubles `capacity` (from 16) until it meets/exceeds `length`, which is set to `value.Length`).
Dan Tao