What is the best way to determine the initial capacity for collection objects?

views:

152

answers:

What is the best way to determine the initial capacity for collection objects?

When using objects that have a capacity, what are some guidelines that you can use to ensure the best effeciency when using to collections? It also seems like .NET framework has set some of these capacities low. For example, I think StringBuilder has an intial capacity of 16. Does this mean that after 16 strings are inserted into the StringBuilder, the StringBuilder object is reallocated and doubled in size?

+2 A:

With StringBuilder, it isn't the number of strings, but the number of characters. In general; if you can predict the length, go ahead and tell it - but since it uses doubling, there isn't a huge overhead in reallocating occasionally if you need to juts use Add etc.

In most cases, the difference will be trivial and a micro-optimisation. The biggest problem with not telling it the size is that unless the collection has a "trim" method, you might have nearly double the size you really needed (if you are very unlucky).

Marc Gravell 2009-03-17 14:10:00

+3 A:

If you know how large a collection or StringBuilder will be up front, it is good practice to pass that as the capacity to the constructor. That way, only one allocation will take place. If you don't know the precise number, even an approximation can be helpful.

Dustin Campbell 2009-03-17 14:10:16

Approximate high and then leave some room.

Henk Holterman 2009-03-17 14:14:18

+2 A:

There are only two circumstances where I ever explicitly set the capacity of a collection

I know the exact number of items that will appear in the collection and I'm using an Array or List<T>.
I am PInvoking into a function which writes to a char[] and i'm using a StringBuilder to interop with parameter. In this case you must set a capacity for the CLR to marshal to native code.

Interestingly, for #1 it is almost always done when I am copying data returned from a COM interface into a BCL collection class. So I guess you could say I only ever do this in interop scenarios :).

JaredPar 2009-03-17 14:13:29

It can be especially helpful for dictionaries since growing a dictionary is such a costly operation.

Dustin Campbell 2009-03-17 14:27:09

But the problem with setting the initial capacity of dictionaries is that you have to take into account the load factor.

Jim Mischel 2009-03-17 15:33:07

Speaking of StringBuilder, I'd dare to use the worst-case size. StringBuilder requires contigous memory block, which is hard to allocate on a highly fragmented heap.

I'd go with an estimation for other collections, though.

arul 2009-03-17 14:54:34

StringBuilder isn't special. Every object requires contiguous memory. I dare say you'd have much bigger problems if your heap was so fragmented you couldn't allocate a StringBuilder, unless it's one HUGE string.

Jim Mischel 2009-03-17 15:35:03

'Every object requires contiguous memory. ' Are you seriously implying that it's a common case to have objects like several mb's big? Try to allocate a 8mb big stringbuilder and fit 9mb of data into it, most likely it will fail, while 10mb stringbuilder might allocate just fine in the first place.

arul 2009-03-17 15:52:27

"Are you seriously implying that it's a common case to have objects like several mb's big?" Why, yes, I am. People regularly work with collections that contain tens of millions of items. 10M references = 40 MB (80 MB on a 64-bit system).

Jim Mischel 2009-03-17 16:03:54

The LinkedList<> does not require continuous memory. Correct?

Xaisoft 2009-03-17 16:04:02

No, the LinkedList<> does not require contiguous memory.

Jim Mischel 2009-03-17 16:34:13

ansaurus

tags:

views:

answers:

What is the best way to determine the initial capacity for collection objects?

related questions