Randomized binary search trees like treap give a good performance (in the order of O(log n)) with high probability while avoiding complicated (and costly) rebalancing operations that are needed for deterministic balanced trees like AVL, red-blackm, AA, etc.
We know that if we add random keys to a simple BST, we can expect it is reasonably balanced. A simple reason is that the number of heavily non-balanced trees for n nodes it's much lower than the number of "almost balanced" trees and hence, a random order for the insertion of the keys is likely to end up with an acceptable tree.
In this case, in "The Art of Computer Programming", Knuth gives a little bit more than 1.3*lg2(n) as the average length of a path which is rather good. He also says that deleting a random key from the a random tree preserves its randomness (and hence its good average balancing).
It seems, then, that a binary search tree where keys are inserted and deleted in a random order would most likely give performance in the order of O(log n) for all three operations: search, insert and delete.
That said, I wonder if the following approach would give the same good properties:
- take an hash function h(x) that is known to be "good" (e.g. it ensure an even spread of the keys)
- use the order set by h(x) on the keys instead of the ordering on k.
- in case of collision, order according the key. That should be rare if the hash key is good enough and the range of the hash function is much bigger than the set of the keys.
To give an example a BST for the key { 4, 3, 5 , 1, 2} inserted in that order, would be:
4
/ \
3 5
/\
1 2
Assuming the hash function would map them to (respectively) {221,142,12,380,18) we would get.
221(4)
/ \
142(3) 380(1)
/ \
12(5) 18(2)
The key point is that a "regular" BST may degenerate because the keys are inserted according the same ordering relation that is used to store them in the tree (their "natural" ordering, for example the alphabetical order of string) but the hash function induces an ordering on the keys that is completely unrelated to the "natural" one and, hence, should give the same results as if the keys were inserted in random order.
A strong assumption is that the hash function is "good", but it's not an unreasonable one, I think.
I didn't find any reference to a similar approach in the literature so it might be completely wrong, but I can't see why!
Do you see any drawback in my reasoning? Anyone has already attempted doing it?