views:

264

answers:

6

For an assignment we were given in Data Structures, we had to create a test class to determine whether the code we were given correctly traversed the binary trees in our test class.

These are the 3 constructors for BinaryTreeNode class that was given to us:

public BinaryTreeNode(Object theElement, BinaryTreeNode theleftChild, BinaryTreeNode therightChild)
{
    element = theElement;
    leftChild = theleftChild;
    rightChild = therightChild;
}

public BinaryTreeNode(Object theElement)
{
    element = theElement;
}

public BinaryTreeNode() {}

I quickly made the following in my test class to create one of the trees specified:

// tree for ( A - B ) / C
BinaryTreeNode b1 = new BinaryTreeNode("A");
BinaryTreeNode b2 = new BinaryTreeNode("-");
BinaryTreeNode b3 = new BinaryTreeNode("B");
BinaryTreeNode b4 = new BinaryTreeNode("/");
BinaryTreeNode b5 = new BinaryTreeNode("C");
BinaryTreeNode bAB = new BinaryTreeNode(b2, b1, b3);
BinaryTreeNode bRoot = new BinaryTreeNode(b4, bAB, b5);
q.put(bRoot);

However, my friend suggested I do it this way:

// tree for ( A - B ) / C
BinaryTreeNode b1 = new BinaryTreeNode("A");
BinaryTreeNode b2 = new BinaryTreeNode("B");
BinaryTreeNode b3 = new BinaryTreeNode("C");
BinaryTreeNode bRoot= new BinaryTreeNode("/", new BinaryTreeNode("-", b1, b2), b3);
q.put(bRoot);

However, he had a tough time explaining why this way was better. Could anybody explain why this is more efficient? If more code is needed from the example, just ask.

+3  A: 

The second one uses strings as 'element' values, the first one uses BinaryTreeNode. I don't know what is supposed to be the value of 'element' but my intuition says that it should be a String. That's the only difference between these snippets.

Dmitry
A: 

The element of each node doesn't need to be a BinaryTreeNode itself. Any arbitrary object will do.

Cory Petosky
In the first example above, the elements are very clearly BinaryTreeNodes. In the second example, they are not. Hence, the second example is better.
Cory Petosky
A: 

The second one reads better for one thing. If you are familiar with reverse polish notation, the second one reads like you would input into a RPN calculator. I think efficiency shouldn't really be an issue here considering this is just a test to validate the functionality of the class.

Spyplane
+2  A: 

It's a mixed bag ;-)

The second approach seems better in terms of its output:
It consistently puts a string in the element member of the BinaryTreeNode, whereby the first approach uses a mix of strings and BinaryTreeNode instances for that purpose.

At a minimum, I would suggest the first snippet to be rewritten as

BinaryTreeNode b1 = new BinaryTreeNode("A");
BinaryTreeNode b3 = new BinaryTreeNode("B");
BinaryTreeNode b5 = new BinaryTreeNode("C");
BinaryTreeNode bAB = new BinaryTreeNode("-", b1, b3);
BinaryTreeNode bRoot = new BinaryTreeNode("/", bAB, b5);

Another advantage of the second method is that it doesn't require extra storage for the intermediate nodes (bAB, for example above). This is not a big issue because these variables are reference types, and therefore require little storage and copy very efficiently. The 2nd method essentially build these nodes in-situ, and this could be considered more efficient. (This said, trying to create all intermediate nodes in this fashion would become quickly unworkable as the depth of the tree grows...)

There comes an interesting pause, a teaching moment ;-), where we reflect on the temptations of premature optimization...

On the other hand... One advantage of the first method (with the semantic change indicated), is that it makes for much a more regular outlay of the code, which may help locate issues in the tree structure more readily.
[Edit:] A possible hybrid solution would be to use the solution #2 to define the bottom two layers of the tree (the leaf-nodes and the layer above them), i.e. defining [up to] three nodes per line. The advantage of this would be to roughly halve the number of lines necessary to define a tree (could become a factor as the tree grows), and also to not have to figure out and reference a variable for the most numerous nodes (the leaf-nodes). Now... such a structure/syntax, would be much less flexible when/if the tree was appended to, with the need of re-balancing it etc. (maybe this is why the instructor suggested to hardcode the tree, to give the students a "feel" for the type of operations needed for re-balancing the tree.)

This said,

neither approach is Supercalifragilisticexpialidocious.

A slightly cooler approach may be to read the tree from a text file. The syntax in that text may mirror rather closely the one-named-node-per-line apparent in the layout of approach #1, so one may argue that in of itself it wouldn't be a big gain... Maybe only saving a few keystrokes. The advantages of such a solution however become more apparent when we consider that this de-couples the "data" from the "code". We do not need to recompile the program (or to have umpteen different versions of this binary) to work on different trees, we just point the program a different text files. Also this de-coupling becomes interesting when/if the code is refactored, say the name of the node class is modified. In the extreme this code vs. data independence would allow using a completely different tree library, the only necessary change would be to produce a bit of logic which maps the node construction and assembly after the logical structure of a node is parsed from the file.

mjv
I'd have probably read from a text file, but we were specifically asked to hard-code the trees, so that's out, unfortunately. I do like your rewrite though, it's a nice go-between.
StormPooper
Would using an array qualify as "hard-code"? If so, you'd get the advantage of a data-ified tree definition, without the need to parse an input file. You'd still be able to keep multiple versions of trees (simply several arrays with different names, or possibly an extra dimension in a big array), and rather than a distinct file name you'd receive a distinct "index of sorts" to designate _the_ tree to be built. Then, again, you may stick to truly hardcoded, and in that case, approach #1 would be my choice.
mjv
@StormPooper, see my edit about a possible "hybrid" solution whereby the bottom 2 layers are handled with approach #2 (i.e. up to 3 nodes per line of code) and the rest of the tree with approach #1: The best of both worlds: a more compact, yet regular, structure.
mjv
A: 

I'm the friend and my reasoning was that it would be inappropriate to have the element as a node, as in this circumstance we were using it to represent mathematical equations. It's my understanding that you could have a root node for a separate tree referenced as a node's element, but that would be totally inappropriate for this example.

The error I saw in the first way was that he was trying to store a whole sub-tree when he used the variable bAB (i.e. nodes b1, b2 and b3 within bAB) I felt that this would while working miss the point of using trees to represent mathematical equations.

Koe
+1  A: 

On the left is the structure the first code produces, the one of the second is on the right:

          * -- "/"            "/"
         / \                  / \
        /   \                /   \
"-" -- *    "C"            "-"   "C"
      / \                  / \
     /   \                /   \
   "A"   "B"            "A"   "B"

The proof of the pudding is in the eating: How would you like to process such a tree? In the second case, you start at the root node, read its element, if it is an operator, recursively read in the children as appropriate, then perform the operator's function on their values. In the first case, however, you don't read the element but check its type first -- if it is a reference to another tree node, it means that it is an operator, so you take that node's value as operator. This conclusion "it's a tree node, so it's an operator" should sound fishy to you. If you want to do dipatching on type, the types should have a meaning, so an operator should be of type "operator". For this exercise, though, I think that dispatching on the string value is sufficient. In conclusion: put the operators directly into the nodes.

Svante