views:

89

answers:

3

I will never be deleting from this data structure, but will be doing a huge number of lookups and insertions (~a trillion lookups and insertions). What is the best data structure for handling this?

Red-black and AVL trees seem decent, but are there any better suited for this situation?

+5  A: 

A hash table would seem to be ideal if you are only doing insertions and lookup by exact key.

Try Splay trees if you are doing insertions, and find/find-next on ordered keys.

I assume that most of your operations are going to be lookups, or you're going to need one heap of a lot of memory.

Ira Baxter
Size the hash table accordingly up-front to avoid re-hashing. You are not going to have a trillion insertions, right?
Thilo
A: 

I would choose a red-black tree or a hash table. Operations on a red-black is O(log2(n)). If implementet right the hash can have a O(1 + k/n). If implementet wrong it can be as bad as o(k). If what you are trying to do is just to make it as fast as possible, I would go with hash and do the extra work. Otherwise I would go with red-black. It is fairly simple and you know your running time.

IronMonkey
A: 

If all of the queries are successful (i.e., to elements that are actually stored in the table), then hashing is probably best, and you could experiment with various types of collision resolution, such as cuckoo hashing, which provides worst-case performance guarantees on lookups (see http://en.wikipedia.org/wiki/Hash_table).

If some queries are in between the stored keys, I would use van Emde Boas trees, y-fast trees, or fusion trees, which offer better performance than binary search trees (see http://courses.csail.mit.edu/6.851/spring10/scribe/lec09.pdf and http://courses.csail.mit.edu/6.851/spring10/scribe/lec10.pdf).

jonderry