views:

183

answers:

3

In .NET, there is a constructor for Dictionary<TKey, TValue> that takes one parameter, int capacity. This is the same as with many other collections such as List<T>, Queue<T>, and Stack<T>; furthermore, according to the MSDN documentation:

The capacity of a Dictionary is the number of elements that can be added to the Dictionary before resizing is necessary. As elements are added to a Dictionary, the capacity is automatically increased as required by reallocating the internal array.

This sounds to me pretty much the same as with other collections like List<T>, etc. Since these collections feature auto-resizing behavior when necessary and are therefore likely to have a greater capacity than required, most of them feature a TrimExcess method. This is handy if, say, you are adding an unknown number of items to the collection at one time, and after that you won't be adding any additional items.

Why does Dictionary<TKey, TValue> not have this same TrimExcess method?

(Disclaimer: I'm quite familiar with the "features do not exist by default" response; I guess I'm mostly just wondering if there's a particular reason why TrimExcess for a Dictionary does not make sense, or why it would be significantly more difficult to implement than for simpler collections like List.)

+3  A: 

I'd guess that in this case the capacity argument helps define the hashing function as well as the number of buckets; resizing/trimming a sparse collection of data would require recalculating hashes of all of the stored items remaining.

Joe
Actually, they use the hashcode of the Key object through `GetHashCode()` and remove the most significant bit. Then they store it in a location in the array by the remainder of `length % hash` (until a free one is found). Of course, the calculation of the hashcodes is key-class dependent.
Abel
In more technical detail, Dictionary chooses which bucket to place an item using "buckets[key.getHashCode() % buckets.Length] = value". Changing the length of the bucket list requires moving all values to new buckets.
Juliet
@Juliet: almost. The bucket is resized when needed and the whole list of entries is copied in the process and indexes are recalculated and thus the objects repositioned.
Abel
+3  A: 

This is partially a guess: a Dictionary is "ordered" as a hash table. The capacity that is reserved, is not simply a bunch of free memory addresses on top of your Dictionary. Instead, it consists of empty room throughout the Dictionary. This is done to make adding / moving / removing etc very efficient. If you had a TrimExcess method for Dictionary, the whole Dictionary would have to copy everything to a new location without any gaps between the elements.

Actually: the gaps should remain otherwise the benefit of a hash table becomes void, trimming (TrimExcess), if implemented, should only trim the internal ValueCollection.

Update: expanded and changed my ill-chosen words
Update: the BCL team says TrimExcess for Dictionaries "could be useful".

Abel
Dictionary is not ordered - it is implemented as a hash table. The ordered equivalent is SortedDictionary.
Paul Baker
I know, it is not ordered in that way, it is ordered as a hash. Sorry if that wasn't clear
Abel
+3  A: 

Per MSDN Dictionary is implemented as a hash table. If you trimmed excess you would have to come up with an algorithm that still provided close to O(1) lookup times in what would effectively be a randomly sorted list.

Bubbafat