views:

230

answers:

3

I am in the middle of developing a custom persistent Key Value type data structure, to compare against SqlLite and Berkley DB. Anyway before I wrote the implementation I wanted to find the best data structure to use for this purposes. I looked at the a couple:

  • An open source redblack tree.
  • Mono Dictionary implementation.

I wanted the datastructures I picked to have performance numbers comparable to the .net dictionary.

I used a simple test for loop with 500k iterations for inserts and used the stopwatch to measure inserts and key look up:

I notice that

  • Berkley DB key lookup time was about the same as the Dictionary.
  • I tried my for loop test for C5 the dictionary, a redblack tree implementation and even mono's dictionary implementation.

Insert time: 7% slower than the .net dictionary.
Lookup time: 1000% slower than the .net dictionary. This is even slower than the look up speed with sqllite!! I attempted to perform the test with compiler optimization turned on and still got similar results.

I realize I am comparing Hashtables vs trees etc, but I stumped as to the performance discrepancy between all the data structures.

Anybody have any ideas

+3  A: 

Two thoughts:

  1. You should make sure you are not inadvertently including JIT time in your tests - this can add a considerable amount of time to the result. You should perform two runs in the same execution and discard the first run.

  2. You should make sure that you are not running under the debugger - this can dramatically skew performance results.

Aside form that, any performance differences you see may very well be the result of the difference in performance between a hash table and a tree. A tree structure typically has O(n*log(n)) performance on average for a lookup. A balanced tree can reduce that to O(lon(n)). Hashtables, meanwhile, can approach O(1) time for lookups when hash collisions are avoided.

I would also imagine that the .NET Dictionary class is highly optimized since it is a bread-and-butter data structure for so many different things in .NET. Also, a generic Dictionary<> may be able to avoid boxing, and therefore you may see some performance differences from that.

LBushkin
I didn't think about the JIT implications good point
Inuka G
That was it, it was the JIT! Something that I did not think of. I executed the test several iterations and the performance of the mono dictionary was about the same as the .net dictionary as expected. Thanks.
Inuka G
+1  A: 

Choose the data structure and repository depending on the data. That said, there is no perfect data structure. While the .NET Dictionary<,> is well optimized because it is often a good choice, it is not the answer to all problems - that would be 42...

Lucero
+1 for gratuitous HGTG reference.
SirDemon
+1  A: 

If all you need is a lookup, a red/black tree will not be your best data structure. It provides sorting, which is always going to be slower than a hashtable lookup. If you want to compare .net Dictionary with a comparable C5 data structure, you would use C5.HashDictionary.

Rachel