views:

1145

answers:

7

I'm writing a 7 card poker hand evaluator as one of my pet projects. While trying to optimize its speed (I like the challenge), I was shocked to find that the performance of Dictionary key lookups was quite slow compared to array index lookups.

For example, I ran this sample code that enumerates over all 52 choose 7 = 133,784,560 possible 7 card hands:

var intDict = new Dictionary<int, int>();
var intList = new List<int>();
for (int i = 0; i < 100000; i ++)
{
    intDict.Add(i, i);  
    intList.Add(i);
}

int result;

var sw = new Stopwatch();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intDict[32131]; // perform C(52,7) dictionary key lookups
sw.Stop();
Console.WriteLine("time for dictionary lookups: {0} ms", sw.ElapsedMilliseconds);

sw.Reset();

sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intList[32131]; // perform C(52,7) array index lookups
sw.Stop();
Console.WriteLine("time for array index lookups: {0} ms", sw.ElapsedMilliseconds);

which outputs:

time for dictionary lookups: 2532 ms
time for array index lookups: 313 ms

Is this type of behavior expected (performance decrease by a factor of 8)? IIRC, a Dictionary has, on average, O(1) lookups, while an array has worst-case O(1) lookups, so I do expect the array lookups to be faster, but not by this much!

I am currently storing poker hand rankings in a Dictionary. I suppose if this is as fast as the dictionary lookups can be, I have to rethink my approach and use arrays instead, although indexing the rankings will get a little tricky and I'll probably have to ask another question about it.

+11  A: 

Don't forget that Big-O notations only says how the complexity grows with respect to the size (etc) - it doesn't give any indication of the constant factors involved. That's why sometimes even a linear search for keys is faster than a dictionary lookup, when there are sufficiently few keys. In this case you're not even doing a search with the array though - just a straight indexing operation.

For straight index lookups, arrays are basically ideal - it's just a case of

pointer_into_array = base_pointer + offset * size

(And then a pointer dereference.)

Performing a dictionary lookup is relatively complicated - very fast compared with (say) a linear lookup by key when there are lots of keys, but much more complicated than a straight array lookup. It has to calculate the hash of the key, then work out which bucket that should be in, possibly deal with duplicate hashes (or duplicate buckets) and then check for equality.

As always, choose the right data structure for the job - and if you really can get away with just indexing into an array (or List<T>) then yes, that will be blindingly fast.

Jon Skeet
I wanted to ask a question about this, but didn't feel it warranted its own question. If strings are immutable, is the hashcode calculated once and stored internally, or is it recalculated every time GetHashCode() is called?
Chris Doggett
Chris: In .NET, it's recalculated every time. In Java it's cached. That's on a per instance basis, but of course a hashtable *does* remember the hash for each key.
Jon Skeet
Are dictionary lookups faster than Linq to datasets lookups?
LB
@LB: I'd expect so, yes.
Jon Skeet
A: 

An array lookup is about the fastest thing you can do - essentially all it is is a single bit of pointer arithmetic to go from the start of the array to the element you wanted to find. On the other hand, the dictionary lookup is likely to be somewhat slower since it needs to do hashing and concern itself with finding the correct bucket. Although the expected runtime is also O(1) - the algorithmic constants are greater so it will be slower.

1800 INFORMATION
+1  A: 

Welcome to Big-O notation. You always have to consider that there is a constant factor involved.

Doing one Dict-Lookup is of course much more expensive than an array lookup.

Big-O only tells you how algorithms scale. Double the amount of lookups and see how the numbers change: Both should take around the twice time.

ebo
+5  A: 

Is this type of behavior expected (performance decrease by a factor of 8)?

Why not? Each array lookup is almost intantaneous/negligeable, whereas a dictionary lookup may need at least an extra subroutine call.

The point of their both being O(1) means that even if you have 50 times more items in each collection, the performance decrease is still only a factor of whatever it is (8).

ChrisW
A: 

The cost of retrieving an element from a Dictionary is O(1), but that's because a dictionary is implemented as a hashtable - so you have to first calculate the hash value to know which element to return. Hashtables are often not that efficient - but they are good for large datasets, or datasets that have a lot of unique-hash values.

The List (apart from being a rubbish word used to dercribe an array rather than a linked list!) will be faster as it will return the value by directly calculating the element you want returned.

gbjbaanb
A: 

Something could take a millenium, and still be O(1).

If you single-step through this code in the disassembly window, you will quickly come to understand what the difference is.

Mike Dunlavey
A: 

Dictionary structures are most useful when the key space is very large and cannot be mapped into a stable, sequenced order. If you can convert your keys into a simple integer in a relatively small range, you will be hard-pressed to find a data structure that will perform better than an array.

On an implementation note; in .NET, dictionaries are essentially hashables. You can somewhat improve their key-lookup performance by ensuring that your keys hash into a large space of unique values. It looks like in your case, you are using a simple integer as a key (which I believe hashes to its own value) - so that may be the best you can do.

LBushkin