views:

262

answers:

4

I have been looking at .NET's List and ArrayList implementations with Reflector.

When looking at the Add(T item) I ran across this.EnsureCapacity(this._size + 1):

public void Add(T item)
{
    if (this._size == this._items.Length)
    {
       this.EnsureCapacity(this._size + 1);
    }
    this._items[this._size++] = item;
    this._version++;
}

So EnsureCapacity looks like this:

private void EnsureCapacity(int min)
{
    if (this._items.Length < min)
    {
        int num = (this._items.Length == 0) ? 4 : (this._items.Length * 2);
        if (num < min)
        {
            num = min;
        }
        this.Capacity = num;
    }
}

Why does the internal Capacity default to 4 and then increment in multiples of 2(ie: double)?

A: 

I bet this is so you can create small lists without multiple allocations. The doubling of size is for simplicity rather than having a complex scaling algorithm.

ChaosPandion
The underlying malloc probably likes multiples of 2 as well, so its not just simplicity.
I figured that was implied since multiples of 2 is so common in computing.
ChaosPandion
+5  A: 

4 is a good default, as most collections will only have a few items in them. The incrementing is done to ensure that you don't go doing memory allocations every time you add an item.

See this good article by Joel on memory usage and why allocating double what you need is a good idea.

http://www.joelonsoftware.com/printerFriendly/articles/fog0000000319.html

Here's the relevant quote:

Suppose you wrote a smart strcat function that reallocates the destination buffer automatically. Should it always reallocate it to the exact size needed? My teacher and mentor Stan Eisenstat suggests that when you call realloc, you should always double the size of memory that was previously allocated. That means that you never have to call realloc more than lg n times, which has decent performance characteristics even for huge strings, and you never waste more than 50% of your memory.

As an aside, the list<> and dictionary<> now default to 10, but I would bet they have the same incrementing logic.

chris.w.mclean
It is also defined behavior for Microsoft's implementation of the CLR. (But doesn't seem to be a requirement of ECMA-364.)
opello
@chris.w.mclean: While doubling the capacity when a resize is needed does ensure that you don't do memory allocations every time you add an item, it's not the only strategy that ensures this. For example, I could just resize by increasing the capacity by two every time a resize is needed. The point of this comment is that avoiding a reallocation every time you add an item isn't the sole goal of resizing by doubling.
Jason
If you end up resizing once, chances are you will need it more than just 2 times. I bet there are studies internally at MS that show that there is an exponential usage to this (exponential growth such as is given by the doubling). I bet there are also studies that show that 10/4 is and was good initial values.
James Deville
@Jason, yes you are correct, which is why I put the link to Joel there about why the doubling strategy is correct. He explains it much better then I...and I also bet that there are enough real world memory usage studies from com and VB that showed them that 4 was a good starting point and another series of studies from .net 1.0 /1.1 instrumented code that showed them 10 was probably a better new default.
chris.w.mclean
A: 

Seems like 4 is a reasonable trade off between large enough to accomodate frequent scenarios of having 4 items or less, and not having too many wasted items.

The capacity doubles for each allocation increase, ensuring that it can hold double the number of items already in the container. This is a similar algorithm to C++'s vector container.

Phillip Ngan
Point of fact: it doesn't grow 'exponentially' but it does 'multiply' (by a factor of two).
opello
@opello: That's the definition of exponential growth. A constant proportional growth rate.
Jason
Sorry, you are indeed correct.
opello
+5  A: 

Regarding doubling when resizing is needed it is for the following reason. Let's say that you want to insert n items into a List. We will resize the list no more than log n times. Therefore, inserting n items into a List will take worst-case O(n) time so insertions are constant in amortized time. Further, the amount of wasted space is bounded above by n. Any strategy of constant proportional growth will result in constant amortized insertion time and linear wasted space. Growing faster than constant proportional growth could result in faster inserts but at the expense of more wasted space. Growing slower than constant proportional growth could result in less wasted space but at the expense of slower inserts.

The default capacity is so that little memory is wasted (and there is no harm in starting small as the doubling-resizing strategy is good from a time/space perspective as we just saw).

Jason