views:

132

answers:

2

According to this question the number of different search trees of a certain size is equal to a catalan number. Is it possible to enumerate those trees? That is, can someone implement the following two functions:

Node* id2tree(int id); // return root of tree

int  tree2id(Node* root); // return id of tree

(I ask because the binary code for the tree (se one of the answers to this question) would be a very efficient code for representing arbitrarily large integers of unknown range, i.e, a variable length code for integers

0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
etc

notice that the number of integers of each code length is 1, 1, 2, 5,.. (the catalan sequence). )

A: 

For general (non-search) binary trees I can see how this would be possible, since when building up the tree there are three choices (the amount of children) for every node, only restricted by having the total reach exactly N. You could find a way to represent such a tree as a sequence of choices (by building up the tree in a specific order), and represent that sequence as a base-3 number (or perhaps a variable base would be more appropriate).

But for binary search trees, not every organisation of elements is acceptable. You have to obey the numeric ordering constraints as well. On the other hand, since insertion into a binary search tree is well-defined, you can represent an entire tree of N elements by having a list of N numbers in a specific insertion order. By permuting the numbers to be in a different order, you can generate a different tree.

Permutations are of course easily counted by using variable-base numbers: You have N choices for the first item, N-1 for the second, etc. That gives you a sequence of N numbers that you can encode as a number with base varying from N to 1. Encoding and decoding from variable-base to binary or decimal is trivially adapted from a normal fixed-base conversion algorithm. (The ones that use modulus and division operations).

So you can convert a number to and from a permutation, and given a list of numbers you can convert a permutation (of that list) from and to a binary search tree. Now I think that you could get all the possible binary search trees of size N by permuting just the integers 1 to N, but I'm not entirely sure, and attempting to prove that is a bit too much for this post.

I hope this is a good starting point for a discussion.

Joren
Catalan numbers grow as O(4^n), which is slower than factorials.
A. Rex
You're right. I don't remember why I thought they grow faster, but I've removed the error. Thanks.
Joren
+1  A: 

It should be possible to convert the id to tree and back.

The id and bitstrings being:

0 -> 0 
1 -> 100 
2 -> 11000 
3 -> 10100 
4 -> 1110000 
5 -> 1101000 
6 -> 1100100 
7 -> 1011000 
8 -> 1010100 

First consider the fact that given a bitstring, we can easily move to the tree (a recursive method) and viceversa (preorder, outputting 1 for parent and 0 for leaf).

The main challenge comes from trying to map the id to the bitstring and vice versa.

Suppose we listed out the trees of n nodes as follows:

Left sub-tree n-1 nodes, Right sub-tree 0 nodes. (Cn-1*C0 of them)
Left sub-tree n-2 nodes, Right sub-tree 1 node.  (Cn-2*C1 of them)
Left sub-tree n-3 nodes, right sub-tree 2 nodes. (Cn-3*C2 of them)
...
...
Left sub-tree 0 nodes, Right sub-tree n-1 nodes. (C0*Cn-1 of them)

Cr = rth catalan number.

The enumeration you have given seems to come from the following procedure: we keep the left subtree fixed, enumerate through the right subtrees. Then move onto the next left subtree, enumerate through the right subtrees, and so on. We start with the maximum size left subtree, then next one is max size -1, etc.

So say we have an id = S say. We first find an n such that

C0 + C1 + C2 + ... + Cn < S <= C0+C1+ C2 + ... +Cn+1

Then S would correspond to a tree with n+1 nodes.

So you now consider P = S - (C0+C1+C2+ ...+Cn), which is the position in the enumeration of the trees of n+1 nodes.

Now we figure out an r such that Cn*C0 + Cn-1*C1 + .. + Cn-r*Cr < P <= Cn*C0 + Cn-1*C1 + .. + Cn-r+1*Cr-1

This tell us how many nodes the left subtree and the right subtree have.

Considering P - Cn*C0 + Cn-1*C1 + .. + Cn-r*Cr , we can now figure out the exact left subtree enumeration position(only considering trees of that size) and the exact right subtree enumeration position and recursively form the bitstring.

Mapping the bitstring to the id should be similar, as we know what the left subtree and right subtrees look like, all we would need to do is find the corresponding positions and do some arithmetic to get the ID.

Not sure how helpful it is though. You will be working with some pretty huge numbers all the time.

Moron