ansaurus

Question

Want to save binary tree to disk for "20 questions" game

Answer 1

+5 A:

I would do a Level-order traversal. That is to say you are basically doing a Breadth-first search algorithm.

You have:

Create a qeueue with the root element inserted into it
Dequeue an element from the queue, call it E
Add the left and right children of E into the queue. If there is no left or right, just put a null node representation.
write node E to disk.
Repeat from step 2.

alt text

Level-order traversal sequence: F, B, G, A, D, I, C, E, H

What you will store on disk: F, B, G, A, D, NullNode, I, NullNode, NullNode, C, E, H, NullNode

Loading it back from disk is even easier. Simply read from left to right the nodes you stored to disk. This will give you each level's left and right nodes. I.e. the tree will fill in from top to bottom left to right.

Step 1 reading in:

Step 2 reading in:

  F 
B

Step 3 reading in:

  F 
 B  G

Step 4 reading in:

   F 
 B  G
A

And so on ...

Note: Once you have a NULL node representation, you no longer need to list its children to disk. When loading back you will know to skip to the next node. So for very deep trees, this solution will still be efficient.

Brian R. Bondy 2008-12-03 16:55:20

+1 for diagram and good explanation.

Dan 2008-12-03 17:34:46

You can also trim all the NullNodes from the end, I think. For some reason you trimmed all but 1.

Eyal 2010-03-06 23:25:50

Answer 2

+1 A:

A simple way to accomplish this is to traverse the tree outputting each element as you do so. Then to load the tree back, simply iterate through your list, inserting each element back into the tree. If your tree isn't self balancing, you may want to reorder the list in such a way that the final tree is reasonably balanced.

Dan 2008-12-03 17:06:54

Easiest solutions are usually best :) also serialization could help

drax 2008-12-03 17:27:16

Good idea, except it won't work if the tree is not a binary search tree (how do you know where to insert the next element?)

Rich 2008-12-03 17:28:42

When you build the tree, you know where to insert, so reading it back in is the same thing. I pretty much suggested the same thing as slim's answer, but without the code.

Dan 2008-12-03 17:34:10

I see, if there were information about where each level ended, you would know. Your comment about balancing it threw me off and made me think you were talking about BST insert.

Rich 2008-12-09 17:08:53

Well, if it is a BST, then reading them in whichever order you traversed will probably produce an unbalanced tree, which is why I mentioned it. The question mentioned a "binary tree", so I presumed balancing may be an issue. How did you accomplish this in the end? I'd be interested to know.

Dan 2008-12-12 20:08:16

Answer 3

A:

I would store the tree like this:

<node identifier>
node data
[<yes child identfier>
  yes child]
[<no child identifier>
  no child]
<end of node identifier>

where the child nodes are just recursive instances of the above. The bits in [] are optional and the four identifiers are just constants/enum values.

Skizz

Skizz 2008-12-03 17:09:34

Answer 4

+6 A:

You can store it recursively:

 void encodeState(OutputStream out,Node n) {
        if(n==null) {
            out.write("[null]");
        } else {
           out.write("{");
           out.write(n.nodeDetails());
           encodeState(out, n.yesNode());
           encodeState(out, n.noNode());
           out.write("}");
        }
  }

Devise your own less texty output format. I'm sure I don't need to describe the method to read the resulting output.

This is depth-first traversal. Breadth-first works too.

slim 2008-12-03 17:11:50

Yes of course! Thank you, this is beautiful...

Rich 2008-12-03 17:26:26

More precisely, this is a preorder traversal. [Knuth, vol. 1], what else?

John R. Strohm 2009-12-09 06:14:36

Answer 5

A:

The most arbitrary simple way is just a basic format that can be used to represent any graph.

<parent>,<relation>,<child>

Ie:

"Is it Red", "yes", "does it have wings" 
"Is it Red", "no" , "does it swim"

There isn't much redundancy here, and the formats mostly human readable, the only data duplication is that there must be a copy of a parent for every direct child it has.

The only thing you really have to watch is that you don't accidentally generate a cycle ;)

Unless that's what you want.

The problem here is rebuilding the tree afterwards. If I create the "does it have wings" object upon reading the first line, I have to somehow locate it when I later encounter the line reading "does it have wings","yes","Has it got a beak?"

This is why I traditionally just use graph structures in memory for such a thing with pointers going everywhere.

[0x1111111 "Is It Red"           => [ 'yes' => 0xF752347 , 'no' => 0xFF6F664 ], 
 0xF752347 "does it have wings"  => [ 'yes' => 0xFFFFFFF , 'no' => 0x2222222 ], 
 0xFF6F664 "does it swim"        => [ 'yes' => "I Dont KNOW :( " , ... etc etc ]

Then the "child/parent" connectivity is merely metadata.

Kent Fredric 2008-12-03 17:12:05

The problem here is rebuilding the tree afterwards. If I create the "does it have wings" object upon reading the first line, I have to somehow locate it when I later encounter the line reading "does it have wings","yes","Has it got a beak?"

slim 2008-12-03 17:19:32

Answer 6

+1 A:

Not sure it's elegant, but it's simple and explainable: Assign a unique ID to each node, whether stem or leaf. A simple counting integer will do.

When saving to disk, traverse the tree, storing each node ID, "yes" link ID, "no" link ID, and the text of the question or answer. For null links, use zero as the null value. You could either add a flag to indicate whether question or answer, or more simply, check whether both links are null. You should get something like this:

1,2,3,"Does it have wings?"
2,0,0,"a bird"
3,4,0,"Does it purr?"
4,0,0,"a cat"

Note that if you use the sequential integers approach, saving the node's ID may be redundant, as shown here. You could just put them in order by ID.

To restore from disk, read a line, then add it to the tree. You will probably need a table or array to hold forward-referenced nodes, e.g. when processing node 1, you'll need to keep track of 2 and 3 until you can fill in those values.

Frank Ames 2008-12-03 17:23:35

Answer 7

A:

In java if you were to make a class serializeable you can just write the class object to disc and read it back using input/output streams.

Aaron 2009-12-09 05:54:33

ansaurus

tags:

views:

answers:

Want to save binary tree to disk for "20 questions" game

related questions