views:

359

answers:

4

I've been able to find details on several self-balancing BSTs through several sources, but I haven't found any good descriptions detailing which one is best to use in different situations (or if it really doesn't matter). Specifically, I want a BST that is optimal for storing in excess of ten million nodes. The order of insertion of the nodes is basically random, and I will never need to delete nodes, so insertion time is the only thing that would need to be optimized. I indend to use it to store previously visited game states in a puzzle game, so that I can quickly check if a previous configuration has already been encountered.

A: 

The two self-balancing BSTs I'm most familiar with are red-black and AVL, so I can't say for certain if any other solutions are better, but as I recall, red-black has faster insertion and slower retrieval compared to AVL. So if insertion is a higher priority than retrieval, red-black may be a better solution.

Daniel F. Hanson
+3  A: 

Red-black is better than AVL for insertion-heavy applications. If you foresee relatively uniform look-up, then Red-black is the way to go. If you foresee a relatively unbalanced look-up where more recently viewed elements are more likely to be viewed again, you want to use splay trees.

lbrandy
+4  A: 

Why use a BST at all? From your description a dictionary will work just as well, if not better. The only reason for using a BST would be if you wanted to list out the contents of the container in key order. It certainly doesn't sound like you want to do that, in which case go for the hash table. O(1) insertion and search, no worries about deletion, what could be better?

boyetboy
A: 

[hash tables have] O(1) insertion and search

Citation needed. My claim-to-proof ratio is too high; do you have a proof?

(What I do believe is that they are fast in practice.)

Jonas Kölker
This one is conventional wisdom. Best case, O(1), obviously implementations will vary. There are a variety of different hash table algorithms as well.
ApplePieIsGood
"This one is conventional wisdom." -- I've heard it many times, but I still haven't seen a proof. I think it would be good to challenge this piece of folklore if you want the theoretical result "it's O(1)", or measure various lookup structures if you want "fast in practice"."Best case, O(1)" -- unbalanced search trees have that as well, yet no one argues that they have "O(1) insertion and search".
Jonas Kölker
a best case unbalanced search tree will be one node from balanced. Best case insertion/lookup is still log(n)
BioBuckyBall
In the best case, the user is searching for the value stored at the root node, which takes O(1) time to access...
Jonas Kölker