ansaurus

Question

.NET Framework - Any way to get Dictionary<> to be a little faster?

Answer 1

A:

If you only have 10 buckets with 2000 things each could you just build a single list with all 20000 things which can be directly indexed by a key known to your loop? For example:

List<MyData> = new List(); 

//add all items to list indexed by their key (RAM is not an issue right?)

item = ItemList[key];

This way you reference them directly with no dictionary or hash lookup.

Joshua 2010-08-09 05:45:35

Yea but building that list would then take way too long. I do that several hundred thousand times..

Mike 2010-08-09 05:53:48

How it List<> implemented? If it's not doing a hash, is lookup O(N)?

Mike 2010-08-09 06:09:11

Oh dumb question, List<> indexes by offset like an array.. But I'm guessing Contains is O(N) or O(log N) when you pass in a compare function.

Mike 2010-08-09 06:19:38

Mike: Contains is O(N); BinarySearch is O(lgN)

Gabe 2010-08-09 06:38:49

Yeah I figured that's why List.Contains() has an override to pass in a comparer, then they could do a binary search.. This makes sense..

Mike 2010-08-09 06:42:19

Answer 2

+5 A:

The dictionary is implemented using a hash table, I have looked at the code using Reflector a while back.

"The dictionary will most likely only have about 10 buckets, but each bucket can point to one of about 2,000 possibly things."

There is your problem. The dictionary uses the hash to locate the bucket, but the lookup in the bucket is linear.

You have to implement a hash algorithm with a better distribution to get better performance. The relation should be at least the opposite, i.e. 2000 buckets with 10 items each.

Guffa 2010-08-09 05:48:49

+1 - exactly the problem. The hash algo is totally borked (as in: inefficient and not properly implemented).

TomTom 2010-08-09 05:54:31

Well that explains why Int32 is so much faster.. Int32 just uses itself for the hash code, where-as Guid and String probably contain some collisions.. My test with an Int32 Dictionary can do about 10M lookups in 900ms, which is probably about as good as I'm gonna get..

Mike 2010-08-09 05:55:21

So let me get this straight.. Dictionary slows down when multiple keys "hash" to the same bucket, correct? Once you fix that, a lookup is O(1)? This basically answers my question..

Mike 2010-08-09 05:57:15

Ok this seems to be working, once I implement GetHashCode on my key, my speeds are about identical to a Dictionary with an Int32 key.

Mike 2010-08-09 06:56:51

Mike: In the example you have above, with .Net 4.0, `map` is a hash table with 17 buckets, of which 7 are filled. That means for every 10 keys you look up, 3 will have a collison. In your case, the collisions are 98405, 109574 in bucket 9 and 38294394, 398, 38784 in bucket 7. This means 38784 will require 3 comparisons, 109574 and 398 will require 2, and all other lookups will require 1 comparison.

Gabe 2010-08-09 07:14:56

If I recall correctly, the bucket will be computed with hashvalue % n - where n is the number of buckets, correct? If this is the case, how does Dictionary<> know what n is? Does it get tricky and try to resize n on the fly, as you add more stuff? Is there a way to set n? I want to prevent /all/ collisions.. I don't care about memory in this case. Thanks!

Mike 2010-08-09 07:47:34

@Mike: The dictionary uses something tricky, where n is a prime number, I don't know exactly what it does when the dictionary resizes. Anyhow, that is implemention details, and the dictionary combines buckets as part of the implementation. You could try to figure out how that works, but the implementation may change and you may end up hurting performance instead.

Guffa 2010-08-09 08:43:08

Is Dictionary(int) the constructor to set the number of buckets initially? Maybe I'm better off writing my own hashtable class.

Mike 2010-08-09 09:08:22

@Mike: No, it only sets it's capacity. That somehow affects how many actual buckets there are, but that's an implementation detail that you should not try to tamper with. The dictionary reduces the number of buckets to keep the memory footprint smaller, which makes it more likely that the dictionary fits in the cache, which would make it a lot faster.

Guffa 2010-08-09 09:36:45

Well the good news is I got my code from running in about 8 sec to about 2.1 sec.. The bottleneck is still in hashmap lookups. Also, adding values to the hash table accounts for 14% of the function time, which leads me to believe it's running into collisions and appending to a linked list, or "adjusting" itself. I think I might write my own hashtable class just for kicks..

Mike 2010-08-09 10:36:31

Answer 3

A:

It sounds like you're saying that your dictionary will only have 10 items in it. If so, a hash table may be unwarranted. You can just store your data in a list/array and either iterate over it or use a binary search to find your keys (try both to see what's faster).

If you use a binary search, your list will have to be sorted; if you just iterate over your list and there are some keys that are looked-up more frequently than others, you can put them at the beginning of the list to speed things up.

On the other hand, if your keys are known in advance, you can write your own implementation of a hash table with a fast and perfect hash function (i.e. no collisions), and that should be unbeatable.

Gabe 2010-08-09 05:59:15

How do you figure? Shouldn't a hash table be O(1)? That sounds better than iterating over ~10 items.. Doing a binary search to find the keys would mean sorting my data structure, which is possible to do, but a bit of work.. In theory, hash table should still be quicker.. If constructing a dictionary was slow, I'd have seen that while profiling..

Mike 2010-08-09 06:06:39

You only have 10 keys, right? That means your lookup will be O(10), which is the same as O(1). In other words, your computer may be able to iterate a list of 10 items faster than it can call the hash function and perform the integer division required to index into the hash table.

Gabe 2010-08-09 06:18:28

Oh yea I gotcha now.. Good point, there's probably a minimum number of keys in a dictionary before hashing really pays off.. I will definitely run some tests to confirm this theory..

Mike 2010-08-09 06:30:07

Just tried it, and even with 10 values in the dictionary, computing the hash is much much MUUUUUCH faster than an O(10) lookup.. 1786ms vs. 437ms.

Mike 2010-08-09 06:55:01

Oh also, my speeds are about double with a Release build.. JIT'er must be doing some nifty inlining or something..

Mike 2010-08-09 06:55:46

I would expect that the hash table would be faster, but it depends on your hash function. In any event, it's hard to believe that twice as fast as 437ms wouldn't be fast enough for you.

Gabe 2010-08-09 07:28:38

Oh man I'll be ecstatic if I get my real code to run in 437ms. I'm just doing tests here, and my loops simulate a typical scenario my actual code would run into. However, 437ms is with only one hash lookup per iteration (I'll probably have to do at least 2 or 3).. It's also using ints as keys, which I have to refactor a whole bunch of code to do. Also, the 437ms is on a ship build.. It's about 700ish on debug.

Mike 2010-08-09 07:51:28

Answer 4

A:

The insight into the inner workings of the hash table are spot on. You should definitely be using TryGetValue as your entire inner loop:

  map.TryGetValue(keys[j], out item);

doing ContainsKey and Item[] is doing the hard part (the lookup) twice. An extra if and an extra keys[j] are minor, but will add up in a tight loop. Using a foreach over your keys will probably be slower, but depending on the actual contents of the loop, it might be worth profiling.

Dolphin 2010-08-09 14:24:07

Answer 5

A:

Adding to the comments about creating your own implementation based on knowing the data, here is an example that will have no clashes. This may throw OutOfMemoryExceptions based on the size of the objects. I tried using an int indexer but that would throw an OutOfMemoryException. If null is returned the item doesn't exist.

I haven't profiled this but I would expect minor speed improvements, but larger memory use.

public class QuickLookup<T> where T : class
{
    private T[] _postives = new T[short.MaxValue + 1];
    private T[] _negatives = new T[short.MaxValue + 1];
    public T this[short key]
    {
        get
        {
            return key < 0 ? _negatives[(key * -1) - 1] : _postives[key];
        }
        set
        {
            if (key < 0)
                _negatives[key * -1] = value;
            else
                _postives[key] = value;
        }
    }
}

Tim Carter 2010-08-13 04:55:44

ansaurus

tags:

views:

answers:

.NET Framework - Any way to get Dictionary<> to be a little faster?

related questions