views:

209

answers:

5

My C++ program creates an unbalanced BST from user input, and saves it do disk. It does this by first indexing each node by doing a pre-order traversal and assigning each node a unique number. Next, it outputs the BST to a file. It does this by doing a pre-order traversal and then for each node printing to file the data value, the index number of the left child, and the index number of the right child.

So after the BST is saved to disk, the BST in memory is destroyed. I want to read in the file and recreate the BST, so that it will be exactly as it was before.

A: 

Assuming you're not going for the best bound you could use this simple algorithm.

nodes_list = []
For each line i:
    Search for node i in nodes_list
    if(found):
        node_i = found_node
        node_i.set_value(line_value)
    else:
        node_i = new node(i)
        node_i.set_value(line_value)
        nodes_list.add(node_i)
    Search for line_left_node in nodes_list
    if(found):
        node_i.set_left_node(found_node)
    else:
        left_node = new node(line_left_node)
        node_i.set_left_node(left_node)
        nodes_list.add(left_node)
    Search for line_right_node in nodes_list
    if(found):
        node_i.set_right_node(found_node)
    else:
        right_node = new node(line_right_node)
        node_i.set_right_node(right_node)
        nodes_list.add(right_node)
Pace
Could you explain what you mean by "line"?
j_random_hacker
+1  A: 

Read all nodes into memory along with left and right child index (say hashtable). Then start from root (which is the first node in the file) and find the left and right child by index from hashtable. Repeat the same process for children nodes in DFS or BFS manner. Either should work.

You can optimize this to get rid of loading entire data into memory before constructing the tree. You can read the nodes and construct the tree in DFS manner. So adding left child is straight forward. While adding right child you have to check the index number. If the dont match, try to add that child in node higher than current node in DFS ordering.

malay
+1  A: 

Postorder would be simpler. You can just push restored nodes onto a stack and the children to be linked to the parent would always be the topmost entries of the stack.

You would only need one pass as well, as long as you recorded with each node which children were saved so you knew what to pop off the stack and which children to assign.

ergosys
That requires storing the external nodes. Consider the degenerate binary tree (a linked list essentially). If you're storing the external nodes, it's just as simple to do it pre-order. Read Self Recurse for left Child, Recurse for right child. Where the base case is simply an external node (i.e. NULL)
Kyle Butt
Not sure what you are getting at, presence of each child would be 2 bits in the node data. However I concede that preorder can be done in one pass as well, as your answer shows.
ergosys
+1  A: 

Let's assume you know the size of the tree (N) up-front. Maybe it's the first line in the file. If not, it's easy to adapt this to dynamically re-size the index vector. Note that this is semi-pseudocode:

// Only needed while parsing the file
std::vector<Node*> index(N, NULL);

// We can always create the root node.
// This simplifies the while loop below.
index[0] = createNode(0);

while (!in.eof()) {
    int nodeID = -1, leftID = -1, rightID = -1;
    parseNode(in, &nodeID, &leftID, &rightID);

    // Guaranteed to be non-NULL
    Node* node = index[nodeID];

    // if leftID or rightID is -1, createNode()
    // will simply return NULL.
    index[leftID] = createNode(leftID);
    index[rightID] = createNode(rightID);

    node->setLeftChild(index[leftID]);
    node->setRightChild(index[rightID]);
}

The reason node = index[nodeID] is guaranteed to be non-NULL is because of the pre-order indexing/write/read. We pre-created the root node at the start, and all other nodes are created by the parent as a left or right child.

EDIT: I just realized we don't need a full master index. We only need to store potential right-child nodes to expand in a stack. Here's that version of the algorithm:

// Candidate right-child nodes to expand
std::stack<Node*> rightNodes;

// Pre-create the root node as "left child"
Node* left = createNode(0);

while (!in.eof()) {
    // We already know the next node. It is the previous
    // node's left child (or root), or the nearest
    // parent's right child.
    Node* node;

    if (left != NULL) {
        node = left;
    }
    else {
        node = rightNodes.top();
        rightNodes.pop();
    }

    parseLine(in, &nodeID, &leftID, &rightID);
    assert(node->ID() == nodeID);

    // if leftID or rightID is -1, createNode()
    // will simply return NULL.
    left = createNode(leftID);
    Node* right = createNode(rightID);

    node->setLeftChild(left);
    node->setRigthChild(right);

    if (right != NULL) {
        rightNodes.push(right);
    }
}
Jason Govig
+1  A: 

If you store a binary search tree as a pre-order traversal, then the tree that you'd get simply by inserting the elements one at a time as you read them out would be the same tree as you started with. That would of course require O(n log n) time. If you're willing to store the external nodes (Nulls) Then you could do something like the following:

ReadBSTPreOrder(node ** target) {

    node * n = readNode();
    *target = n;

    if (n == NULL) return;
    ReadBSTPreOrder(&node->left);
    ReadBSTPreOrder(&node->right);
}

This has the added advantage of working for binary trees that aren't BST's as well. Storing the Nulls can be a single bit if you're willing to monkey around with the representation, but a single byte tag for a null and a different tag for a record should do. That'll save you from writing out the indices as well.

Kyle Butt
It would be simpler to just record a bit for each child in the node, recurse only if the particular child is present. +1 anyway.
ergosys
Note: recursion may be a problem for large tree depths (stack overflow). If that is a concern, you can convert the recursive algorithm to an equivalent iterative algorithm, maintaining your own heap-based stack (see my answer, for example).
Jason Govig