views:

618

answers:

2

So I just was testing the CLR Profiler from microsoft, and I did a little program that created a List with 1,000,000 doubles in it. I checked the heap, and turns out the List<> size was around 124KB (I don't remember exactly, but it was around that). This really rocked my world, how could it be 124KB if it had 1 million doubles in it? Anyway, after that I decided to check a double[1000000]. And to my surprise (well not really since this is what I expected the with the List<> =P), the array size is 7.6MB. HUGE difference!!

How come they're different? How does the List<> manage its items that it's so (incredibly) memory efficient? I mean, it's not like the other 7.5 mb were somewhere else, because the size of the application was around 3 or 4 KB bigger after I created the 1 million doubles.

+16  A: 

List<T> uses an array to store values/references, so I doubt there there will be any difference in size apart from what little overhead List<T> adds.

Given the code below

var size = 1000000;
var numbers = new List<double>(size);
for (int i = 0; i < size; i++) {
   numbers.Add(0d);
}

the heap looks like this for the relevant object

0:000> !dumpheap -type Generic.List  
 Address       MT     Size
01eb29a4 662ed948       24     
total 1 objects
Statistics:
      MT    Count    TotalSize Class Name
662ed948        1           24 System.Collections.Generic.List`1[[System.Double,  mscorlib]]
Total 1 objects

0:000> !objsize 01eb29a4    <=== Get the size of List<Double>
sizeof(01eb29a4) =      8000036 (    0x7a1224) bytes     (System.Collections.Generic.List`1[[System.Double, mscorlib]])

0:000> !do 01eb29a4 
Name: System.Collections.Generic.List`1[[System.Double, mscorlib]]
MethodTable: 662ed948
EEClass: 65ad84f8
Size: 24(0x18) bytes
 (C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
65cd1d28  40009d8        4      System.Double[]  0 instance 02eb3250 _items    <=== The array holding the data
65ccaaf0  40009d9        c         System.Int32  1 instance  1000000 _size
65ccaaf0  40009da       10         System.Int32  1 instance  1000000 _version
65cc84c0  40009db        8        System.Object  0 instance 00000000 _syncRoot
65cd1d28  40009dc        0      System.Double[]  0   shared   static _emptyArray
    >> Domain:Value dynamic statics NYI
 00505438:NotInit  <<

0:000> !objsize 02eb3250 <=== Get the size of the array holding the data
sizeof(02eb3250) =      8000012 (    0x7a120c) bytes (System.Double[])

So the List<double> is 8,000,036 bytes, and the underlying array is 8,000,012 bytes. This fits well with the usual 12 bytes overhead for a reference type (Array) and 1,000,000 times 8 bytes for the doubles. On top of that List<T> adds another 24 bytes of overhead for the fields shown above.

Conclusion: I don't see any evidence that List<double> will take up less space than double[] for the same number of elements.

Brian Rasmussen
That's right, I triple checked and the sizes are almost the same, in fact the List<> is slightly larger. I don't know what could've had gone wrong the first time, I did have some trouble running the app with the CLR profiler, so maybe I was looking at a different instance of the application when I ran the 1,000,000 doubles code. Thanks for straigtening this out.
Carlo
Carlo, I've learnt to question all results from analysis/profiling tools and always get a "second opinion", even if the results match my expectations. Often enough it's just unexplainable, but sometimes there's a deeper understanding hidden behind the "strange".
peterchen
+1  A: 

Please note that the List is dynamically grown, usually doubling the size every time you hit the internal buffer size. Hence, the new list would have something like 4 element array initially, and after you add the first 4 elements, the 5th element would cause internal reallocation doubling the buffer to (4 * 2).

Yurik