views:

282

answers:

2

First of all I assume I've missed something major when thinking about this, but I still wanted to post about it to see if I really didn't miss anything, on with it...

I have a pretty write heavy binary tree (about 50/50 between writes and reads), and on the way home today I was thinking about ways to optimize this, especially make the writes faster - this is what I came up with.

Considering that the operation add(T, x) to add x to tree T first consists of find(T, x) to see if x already exists, and in that case it doesn't return the parent so we can add it instead of one of the parents empty leaves.

What if we add a hash table as an intermediate cache to the add operation, so when we call add(T, x) what really happens is that x is hashed and inserted into the hash map M. And that's it. The optimization takes place when we somewhere else asks to find(T, x), now when we search the tree we will come to a leaf node, since x isn't inserted the tree yet (it only exists in the hash map M), we hash x and compare it to the keys in M to see if it is supposed to be in the tree. If it's found in M then we add it to the tree and remove it from M.

This would eliminate the find(T, x) operation on add(T, x) and reduce it to add(M, x) which is O(1). And then (ab)-use the find(T, x) operation that is performed when we look up the node the first time to insert it.

+5  A: 

Why not use a hashtable for everything and omit the binary tree entirely?

It all depends why you were using binary trees in the first place. If you chose binary trees to enhance sharing, you lose with the hashtable caches because hashtables are not shared.

The caches do not make comparing two maps any easier either.

EDIT:

If the operations that take advantage of the specificities of trees are rare (you mention taking advantage of the fact that RB trees are sorted) and if, on the other hand, you often look up a key that has recently been added, or replace the value of a key that has recently been added, a small-size cache implemented with another structure may make sense. You can also consider using a hashtable representation with the occasional conversion to a tree.

The additional complexity of this cache layer may mean that you do not gain any time in practice though, or not enough to repay the technical debt of having an ad-hoc data structure like this.

Pascal Cuoq
Well I use the binary tree because it's sorted. The tree (RB-tree in this case) is used as a sorted index that is updated frequently.
thr
+3  A: 

If your need is to have a structure that has O(1) inserts, and approximately O(n) amortized ordered iteration, I had the same problem:

http://stackoverflow.com/questions/1319763/key-ordered-dict-in-python

The answer (keeping a hash and a partially sorted list, and using a partially-sorted-structure-friendly sort like TimSort) worked very well in practice in my case.

LeMiz