tags:

views:

1301

answers:

6

How do you compute the average height of a binary search tree when adding 1000 random ints? What is the average height?

A: 

it depends on the order the are added. If you start with the smallest value then the tree will be deeper because all new values will be added to the right child BST. If you add the largest value first then the left child will be deep while the right is empty.

Josh Curren
+2  A: 

You can compute the height of a binary tree using this recursive definition:

height(empty) = 0
height(tree) = 1 + max(height(tree.left), height(tree.right))

One way to empirically measure the average height of such a tree is to repeatedly create an empty tree and add 1000 random items to it. Measure the height of each trial using the above function, and average them.

I suspect your task is probably to find a formula for the average height of a binary tree.

Greg Hewgill
+4  A: 

It depends on whether you are using any sort of balanced tree structure (such as a red-black tree). Since you are inserting random numbers into a binary tree, it would be reasonable to expect that the average depth is approximately log2(1000) - so the values 10 and 11 would be 'normal'. I'm not sure how far it could deviate from that; no shallower than 10 levels, possibly somewhat deeper. An extreme case with no balancing would be 1000 deep; that is unlikely to happen with random numbers.

Jonathan Leffler
A: 

Regardless of what tree you are using the average height will be log2(1000), as someone mentioned before. It's true that depending on the order of the numbers inserted the actual height may vary, but assuming randomly distributed numbers, which you mention, then the actual value will, more often than not, approximate the expected value (Which, once again, is log2(1000))

jrcalzada
That's incorrect.For a binary tree to be balanced, the median element must be the first added node. There will only be a 1/N chance of this occuring to start with, and even after this the sub-trees on either side will need to be balanced.There is actually a very low probability that it will be log2(1000) by chance, a small fraction of 1/1000.
Andrew Shepherd
+1  A: 

This question is in fact, tricky. The answer will not be 1000, because that is improbable, but log2(1000) is also improbable, but even more so depending on how the tree is grown.

If you add an int by stepping though the tree then naively appending it the tree will virtually always be taller than log2(1000).

Talk to a statistician, because this seems related to normal probability distributions. Those are generated by lots of iterated random events( heads one unit right, tails ditto left), and the value of a random integer iterates through the tree as it settles out into a leaf.

+4  A: 
Andrew Shepherd
Nice work. Have you tried any sort of extrapolation from the values you've got to N=1000? Crude linear extrapolation based on H = 14 (at about N=120) and H = 18 (at about N=350) suggests H = 29 (~ 560/230 * 4 + 19) at N=1000. The curve is flatter than that; it is likely nearer the range 25-27, it seems to me.
Jonathan Leffler