tags:

views:

152

answers:

3

Is it worthwhile to initialize the collection size of a List<T> if it's reasonably known?

Edit: Furthering this question, after reading the first answers this question really boils down to what is the default capacity and how is the growth operation performed, does it double the capacity etc.?

+12  A: 

It is, as per documentation

If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the List(T).

Asad Butt
What's the default capacity?
Sylvestre Equy
I think, it is "0". May check any time using List(T).Capacity prop
Asad Butt
+4  A: 

Well, it will stop you the values in the list (which will be references if the element type is a reference type) from having to be copied occasionally as the list grows.

If it's going to be a particularly large list and you've got a pretty good idea of the size, it won't hurt. However, if estimating the size involves extra calculations or any significant amount of code, I wouldn't worry about it unless you find it becomes a problem - it could distract from the main focus of the code, and the resizing is unlikely to cause performance issues unless it's a really big list or you're doing it a lot.

Jon Skeet
+5  A: 

Yes, it gets to be important when your List<> gets large. The exact numbers depend on the element type and the machine architecture, let's pick a List of reference types on a 32-bit machine. Each element will then take 4 bytes inside an internal array. The list will start out with a Capacity of 0 and an empty array. The first Add() call grows the Capacity to 4, reallocating the internal array to 16 bytes. Four Add() calls later, the array is full and needs to be reallocated again. It doubles the size, Capacity grows to 8, array size to 32 bytes. The previous array is garbage.

This repeats as necessary, several copies of the internal array will become garbage.

Something special happens when the array has grown to 65,536 bytes (16,384 elements). The next Add() doubles the size again to 131,072 bytes. That's a memory allocation that exceeds the threshold for "large objects" (85,000 bytes). The allocation is now no longer made on the generation 0 heap, it is take from the Large Object Heap.

Objects on the LOH are treated specially. They are only garbage collected during a generation 2 collection. And the heap doesn't get compacted, it takes too much time to moves such large chunks.

This repeats as necessary, several LOH objects will become garbage. They can take up memory for quite a while, generation 2 collections do not happen very often. Another problem is that these large blocks tend to fragment the virtual memory address space.

This doesn't repeat endlessly, sooner or later the List class needs to re-allocate the array and it has grown so large that there isn't a hole left in the virtual memory address space to fit the array. Your program will bomb with an OutOfMemoryException. Usually well before all available virtual memory has been consumed.

Long story short, by setting the Capacity early, before you start filling the List, you can reserve that large internal array up front. You won't get all those awkward released blocks in the Large Object Heap and avoid fragmentation. In effect, you'll be able to store many more objects in the list and your program runs a lot leaner since there's so little garbage. Do this only if you have a good idea how large the list will be, using a large Capacity that you'll never fill is wasteful.

Hans Passant
Very informative answer, I don't think I ever had to work with Lists greater than 10,000 elements but now if I ever think I will need to this knowledge will be very useful to understand the implications of Lists that large.
Chris Marisic
+1. I like this answer a lot.
RichardOD