views:

145

answers:

4

I'm trying to decide which datastructure to use to store key-value pairs when only features needed are

  • insertion
  • lookup

Specifically, I don't need to be able to delete pairs, or iterate through keys/values/pairs.

The keys are integer tuples, the values are pointers (references, whatever). I'm only storing a couple million pairs spread out over (many) objects.

Currently I'm considering using either

  • a hash table
  • a kd-tree
  • a b-tree

I'm leaning toward the hash table (for the O(1) insertion/lookup time), but I wanted to confirm my leanings.

Which structure (of those above or other) would you recommend and why? If you would recommend a hash table, should I create a separate table for each object, or just create a single table and use the object's id as part of the key tuple?

+4  A: 

A hashtable will be the best choice here as all the operations that matter to you are O(1) (and as such you shouldn't need to worry about creating multiple hashtables).

Andrew Hare
O(1) vs O(log n) be aside -- I've (anecdotally) read that hash-maps only perform better above some N as the constant is high enough to discourage its use for some cases. A couple million pairs sounds high enough for me, but do you some numbers? Or is this so much implementation dependent that only profiling will help anyways?
gimpf
+1  A: 

I'm a big fan of hash tables, since they are easy and there are implementations available for pretty much every major language out there. The O(1) insertion/lookup is an especially good feature.

You should probably use a single table, to save on memory. Hash tables are notoriously inefficient memory-wise, and using a single table would help to minimize that.

Jonathan
+1  A: 

Hash tables would be usefull here and I see no reason to have more than one table.

Jambobond
this post has a background of hash tables http://stackoverflow.com/questions/371136/binary-trees-vs-linked-lists-vs-hash-tables
Jambobond
A: 

Most trees have an O(n ln n) lookup time, but hashtables have an O(1) lookup time, so that's the one you want to use. It's also very common, and often the implementation is highly-optimised to boot.

thecoop
I thought they had O(log n)?
gimpf