ansaurus

Question

Answer 1

+3 A:

Dictionary makes no effort to keep track of a list of keys. So the iterator needs to walk the buckets. Many of these buckets, particularly for a large dictionary, many not have anything in them.

It may be helpful to compare OpenJDK's HashIterator.nextEntry and PrivateEntryIterator.nextEntry (which uses TreeMap.successor). The hash version walks an unknown number of entries looking for one that's non-null. This could be particularly slow if the hash table has had many elements removed (which it has in your case). In TreeMap, the only walking we do is our in-order traversal. There are no nulls in the way (only at the leaves).

Matthew Flaschen 2010-06-15 15:53:14

The amortized time per item returned, though, ought to be about the same regardless of the size of the dictionary.

Nick Johnson 2010-06-15 17:10:31

@Nick: No, it isn't. See my answer.

SLaks 2010-06-16 19:12:51

Modulo the edge case of removing items - which sounds like a weakness of .net's implementation - the proportion of filled buckets should be the same regardless of the size.

Nick Johnson 2010-06-16 22:16:44

@Nick, Not just .NET's implementation. Java suffers too. C++ STL doesn't.

Rotsor 2010-06-17 02:28:07

Answer 2

A:

Without looking, the simplest implementation of a sorted dictionary is a sorted list (like TreeSet) of keys and a hash combined; the list gives you the ordering, the dictionary gives you values. Thus the keys are already available. Hashtable does not have keys readily available, thus the culprit is not first, it's keys (all without any shred of evidence, feel free to test the hypothesis ;D )

Amadan 2010-06-15 15:55:09

.Net's `Dictionary<TKey, TValue>` uses a hash table.

SLaks 2010-06-15 15:57:09

Probably. I was speaking in general (using hashtable and dictionary interchangeably) - it should be applicable to any paradigm. In .net, specifically, they make a difference between the two in type enforcement, but it does not make any difference to the question at hand - the structure of data is same.

Amadan 2010-06-15 16:02:44

Answer 3

+1 A:

Well, Hash Tables aren't sorted, my guess is it it has to do some sort of sort before it can do an iteration, or some sort of scan, if its already sorted, it can just loop through.

Meiscooldude 2010-06-15 15:55:14

Although, I believe Dictionary is a Tree in the back end.

Meiscooldude 2010-06-15 15:56:09

.Net's `Dictionary<TKey, TValue>` uses a hash table.

SLaks 2010-06-15 15:56:48

Also, a remove on a tree could be kind of expensive.

Meiscooldude 2010-06-15 15:57:22

@SLaks thanks for the info.

Meiscooldude 2010-06-15 15:57:45

Answer 4

+7 A:

Dictionary<TKey, TValue> maintains a hash table.

Its enumerator will loop through the buckets in the hash table until it finds a non-empty bucket, then return the value in that bucket.
Once the dictionary grows large, this operation becomes expensive.
In addition, removing an item from the dictionary doesn't shrink the buckets array, so the First() call gets slower as you remove items. (Because it has to loop further to find a non-empty bucket)

By the way, you can avoid the value lookup like this: (This will not make it noticeably faster)

var kvp = todo.First();

//Use kvp.Key and kcp.Value

SLaks 2010-06-15 15:56:23

Yes, your explanation is correct and complete.By the way, Microsoft documentation says that GetEnumerator() operation is O(1) for Dictionary. Yet it doesn't say anything about the enumerator's MoveNext() performance. ;)

Rotsor 2010-06-15 16:16:14

Answer 5

A:

Reflector shows that Dictionary<TKey, TValue> maintains a Entry<TKey, TValue> array that it's KeyCollection<TKey, TValue>.Enumerator<TKey, TValue> uses. Normally, the lookup should be relatively fast, as it can just index into the array (assuming you don't want a sorted First):

// Dictionary<TKey. TValue>
private Entry<TKey, TValue>[] entries;

However, if you're removing the first elements of that array, then you end up walking the array until you find a non-empty one:

// Dictionary<TKey, TValue>.KeyCollection<TKey, TValue>.Enumerator<TKey, TValue>
while (this.index < this.dictionary.count) {
    if (this.dictionary.entries[this.index].hashCode >= 0) {
        this.currentKey = this.dictionary.entries[this.index].key;
        this.index++;
        return true;
    }
    this.index++;
}

As you remove your entries, you start getting more and more empties at the front of the entries array, and it becomes slower to retrieve First next time.

Mark Brackett 2010-06-15 16:52:58

ansaurus

tags:

views:

answers:

Why is Dictionary.First() so slow?

related questions