views:

2278

answers:

7

In short, I'd like to learn/develop an elegant method to save a binary tree to disk (a general tree, not necessarily a BST). Here is the description of my problem:

I'm implementing a game of "20-questions". I've written a binary tree whose internal nodes are questions and leaves are answers. The left child of a node is the path you'd follow if somebody answered "yes" to your current question, while the right child is a "no" answer. Note this is not a binary search tree, just a binary tree whose left child is "yes" and right is "no".

The program adds a node to a tree if it encounters a leaf that is null by asking the user to distinguish her answer from the one the computer was thinking of.

This is neat, because the tree builds itself up as the user plays. What's not neat is that I don't have a good way of saving the tree to disk.

I've thought about saving the tree as an array representation (for node i, left child is 2i+1, and 2i+2 right, (i-1)/2 for parent), but it's not clean and I end up with a lot of wasted space.

Any ideas for an elegant solution to saving a sparse binary tree to disk?

+5  A: 

I would do a Level-order traversal. That is to say you are basically doing a Breadth-first search algorithm.

You have:

  1. Create a qeueue with the root element inserted into it
  2. Dequeue an element from the queue, call it E
  3. Add the left and right children of E into the queue. If there is no left or right, just put a null node representation.
  4. write node E to disk.
  5. Repeat from step 2.

alt text

Level-order traversal sequence: F, B, G, A, D, I, C, E, H

What you will store on disk: F, B, G, A, D, NullNode, I, NullNode, NullNode, C, E, H, NullNode

Loading it back from disk is even easier. Simply read from left to right the nodes you stored to disk. This will give you each level's left and right nodes. I.e. the tree will fill in from top to bottom left to right.

Step 1 reading in:

F

Step 2 reading in:

  F 
B

Step 3 reading in:

  F 
 B  G

Step 4 reading in:

   F 
 B  G
A

And so on ...

Note: Once you have a NULL node representation, you no longer need to list its children to disk. When loading back you will know to skip to the next node. So for very deep trees, this solution will still be efficient.

Brian R. Bondy
+1 for diagram and good explanation.
Dan
You can also trim all the NullNodes from the end, I think. For some reason you trimmed all but 1.
Eyal
+1  A: 

A simple way to accomplish this is to traverse the tree outputting each element as you do so. Then to load the tree back, simply iterate through your list, inserting each element back into the tree. If your tree isn't self balancing, you may want to reorder the list in such a way that the final tree is reasonably balanced.

Dan
Easiest solutions are usually best :) also serialization could help
drax
Good idea, except it won't work if the tree is not a binary search tree (how do you know where to insert the next element?)
Rich
When you build the tree, you know where to insert, so reading it back in is the same thing. I pretty much suggested the same thing as slim's answer, but without the code.
Dan
I see, if there were information about where each level ended, you would know. Your comment about balancing it threw me off and made me think you were talking about BST insert.
Rich
Well, if it is a BST, then reading them in whichever order you traversed will probably produce an unbalanced tree, which is why I mentioned it. The question mentioned a "binary tree", so I presumed balancing may be an issue. How did you accomplish this in the end? I'd be interested to know.
Dan
A: 

I would store the tree like this:

<node identifier>
node data
[<yes child identfier>
  yes child]
[<no child identifier>
  no child]
<end of node identifier>

where the child nodes are just recursive instances of the above. The bits in [] are optional and the four identifiers are just constants/enum values.

Skizz

Skizz
+6  A: 

You can store it recursively:

 void encodeState(OutputStream out,Node n) {
        if(n==null) {
            out.write("[null]");
        } else {
           out.write("{");
           out.write(n.nodeDetails());
           encodeState(out, n.yesNode());
           encodeState(out, n.noNode());
           out.write("}");
        }
  }

Devise your own less texty output format. I'm sure I don't need to describe the method to read the resulting output.

This is depth-first traversal. Breadth-first works too.

slim
Yes of course! Thank you, this is beautiful...
Rich
More precisely, this is a preorder traversal. [Knuth, vol. 1], what else?
John R. Strohm
A: 

The most arbitrary simple way is just a basic format that can be used to represent any graph.

<parent>,<relation>,<child>

Ie:

"Is it Red", "yes", "does it have wings" 
"Is it Red", "no" , "does it swim"

There isn't much redundancy here, and the formats mostly human readable, the only data duplication is that there must be a copy of a parent for every direct child it has.

The only thing you really have to watch is that you don't accidentally generate a cycle ;)

Unless that's what you want.

The problem here is rebuilding the tree afterwards. If I create the "does it have wings" object upon reading the first line, I have to somehow locate it when I later encounter the line reading "does it have wings","yes","Has it got a beak?"

This is why I traditionally just use graph structures in memory for such a thing with pointers going everywhere.

[0x1111111 "Is It Red"           => [ 'yes' => 0xF752347 , 'no' => 0xFF6F664 ], 
 0xF752347 "does it have wings"  => [ 'yes' => 0xFFFFFFF , 'no' => 0x2222222 ], 
 0xFF6F664 "does it swim"        => [ 'yes' => "I Dont KNOW :( " , ... etc etc ]

Then the "child/parent" connectivity is merely metadata.

Kent Fredric
The problem here is rebuilding the tree afterwards. If I create the "does it have wings" object upon reading the first line, I have to somehow locate it when I later encounter the line reading "does it have wings","yes","Has it got a beak?"
slim
+1  A: 

Not sure it's elegant, but it's simple and explainable: Assign a unique ID to each node, whether stem or leaf. A simple counting integer will do.

When saving to disk, traverse the tree, storing each node ID, "yes" link ID, "no" link ID, and the text of the question or answer. For null links, use zero as the null value. You could either add a flag to indicate whether question or answer, or more simply, check whether both links are null. You should get something like this:

1,2,3,"Does it have wings?"
2,0,0,"a bird"
3,4,0,"Does it purr?"
4,0,0,"a cat"

Note that if you use the sequential integers approach, saving the node's ID may be redundant, as shown here. You could just put them in order by ID.

To restore from disk, read a line, then add it to the tree. You will probably need a table or array to hold forward-referenced nodes, e.g. when processing node 1, you'll need to keep track of 2 and 3 until you can fill in those values.

Frank Ames
A: 

In java if you were to make a class serializeable you can just write the class object to disc and read it back using input/output streams.

Aaron