views:

513

answers:

6

I was using SortedList() in a class which stores about 15-100K data.

Recently my requirements changed, data should not be stored as sorted any more so I switched to List().

However in this case I noticed that List() consumes about 20%+ more memory.

9K items:

  • SortedList: 105MB
  • List: 125MB

15K items:

  • SortedList: 115MB
  • List: 140MB

In the environment I develop, memory is quite crucial. Instead of List() what can I use to avoid this extra memory consumption and still have a non-sorted list?

P.S. I do use a HashSet(Of String) to provide uniqueness check while using List(Of) to simulate SortedList.ContainsKey() although I don't think it can bring such memory overhead.

P.S. 2: My app has got about 80 MB base memory allocation in the startup. So numbers should read as 105-80=25, 125-80 =45 and so on

RESULTS

Thanks for the all answers, final results are:

  • You should set the correct capacity to save memory
  • Hashset is very bad about memory, and consumes way more than expectations. This was the problem. Somehow SortedList() manages to use less memory for a similar functionality.

Some Bencmarks: 500 chars, 250000 insert

List(OF STring)(50000)

274 ms - 226 MB

SortedList(Of String, String)(50000)

34868 ms - 230 Mb

Hashset

420 ms - 232 MB

Dictionary(OF String, Object)

486 ms - 234 MB

Although when I changed decreased count to 25, then:

Hashset for 600.000 iteration 300 Mb where List() is 286 Mb

Also about Hashset memory usage: http://blog.mischel.com/2008/04/09/hashset-limitations/ Dictionary(Of string, object) was not much better either in my test.

+1  A: 

If you already have a HashSet of your collection, I'm not sure why you need a List as well, but if you looking for a container that guarantees uniqueness and ContainsKey() functionality, why not a generic Dictionary?

Regardless of your decision on the questions above, using something like the Task Manager is just too inaccurate to make decisions about memory consumption in .NET. If you've not already done so, grab a trial of SciTech's .NET Memory Profiler or ANTS Profiler and run your app. Take a snapshot of your memory usage just before loading up your set and just after to compare. You can do this with several collection types to measure the relative memory usage of each in a highly accurate way.

Jerry Bullard
+1 for recommending the use of a profiler to get accurate data.
Martinho Fernandes
+5  A: 

Are you preallocating the List<T> capacity?

Small experiment that I did:

This program takes ~640MB

List<int> list = new List<int>(0);

for (int i = 0; i < 100000000; i++)
{
    list.Add(i);
}

This program takes ~320MB

List<int> list = new List<int>(100000000);

for (int i = 0; i < 100000000; i++)
{
    list.Add(i);
}
Shay Erlichmen
+1 This is an excellent observation because a preallocated list will grab a contiguous block of RAM at once if it can and reduce the overhead generated by memory fragmentation.
Jerry Bullard
Very good point, I'll give it a try now. Doesn't the same problem exist for SortedList() ?
dr. evil
All containers that have the “capacity” property needs it to be set for optimal performance.
Shay Erlichmen
What I'm trying to say this should be irrelevant in this comparison case since I'm comparing SortedList() and List() and did not preallocate neither of them.
dr. evil
List<T> is also optimized to store everything in one continues block, I don't know how SortedList<T> is implemented. You will have to check it for your self. don't forget to report the results :)
Shay Erlichmen
I'll let you know, BTW capacity makes a HUGE difference, thanks for the tip.
dr. evil
Apparently problem was actually happening because of Hashset() it's using enormous amount of memory. Now I need to find a better data type to store this unique keys.
dr. evil
+1  A: 

Hashsets (& hashtables) use a lot of memory ! Much more than a simple list/sortedlist

Lotfi
A Dictionary uses twice as much memory internally as a List (50% more on a 64 bit system), so the difference isn't that big.
Guffa
Well, it's not exactly 50%. Hashtables are never 100 % full, they are re-sized when they reach 70%. Lists are only resized when the they are completely full.
Lotfi
+1  A: 

A List<T> with 9k items would have a capacity between 9k and 18k, so the overhead for those items would be between 36 and 72 kilobyte (the double on a 64 bit system).

Clearly the 72 kB is not even close to the 20 MB difference that you see, so the memory use of the list itself can not be the cause. Escpecially considering that the sorted list also has to keep a reference to each object, so the memory usage should be the same.

So, either there is something else using memory, or you are not looking at the actual memory usage of the application. If you are looking in the task manager, you are not seeing how much memory is used, only how much the memory manager has allocated.

Guffa
A: 

Check out Power Collections by Wintellect, the .NET equivalent for STL type collections. I believe the Set type should give you the functionality you need(uniqueness) but you'd have to do the benchmarks for comparison. Just my 2 cents.

Abhijeet Patel
A: 

I would recommend looking at glazed lists (http://sites.google.com/site/glazedlists/). These are very quick for sorting and are great with memory.

reccles