tags:

views:

3669

answers:

14

It's been a while from those school years. Got a job as IT specialist at a hospital. Trying to move to do some actual programming now. I'm working on binary trees now, and I was wondering what would be the best way to determine if the tree is height-balanced.

I was thinking of something along this:

 public boolean isBalanced(Node root){
            if(root==null){
             return true;  //tree is empty
 }
 else{
  int lh = root.left.height();
  int rh = root.right.height();
  if(lh - rh > 1 || rh - lh > 1){
   return false;
  }
 }
 return true;
}

Is this a good implementation? or am I missing something?

+7  A: 

This only determines if the top level of the tree is balanced. That is, you could have a tree with two long branches off the far left and far right, with nothing in the middle, and this would return true. You need to recursively check the root.left and root.right to see if they are internally balanced as well before returning true.

Jesse Rusak
+1  A: 

Well, you need a way to determine the heights of left and right, and if left and right are balanced.

And I'd just return height(node->left) == height(node->right);

As to writing a height function, read: http://stackoverflow.com/questions/717725/understanding-recursion/717839#717839

tpdi
You want left and right heights to be within 1, not necessarily equal.
Alex B
+1  A: 

What kind of tree are you talking about? There are self-balancing trees out there. Check their algorithms where they determine if they need to reorder the tree in order to maintain balance.

lothar
+2  A: 

Balancing usually depends on the length of the longest path on each direction. The above algorithm is not going to do that for you.

What are you trying to implement? There are self-balancing trees around (AVL/Red-black). In fact, Java trees are balanced.

Uri
+3  A: 

If this is for your job, I suggest:

  1. do not reinvent the wheel and
  2. use/buy COTS instead of fiddling with bits.
  3. Save your time/energy for solving business problems.
Steven A. Lowe
A: 

So I changed the code, but it doesnt seem to be working. ps. I do have a method to determine the height. it's just height();

Any ideas on how to fix it.

public boolean heightBalanced(BinaryTree<T> tree){
 if(tree==null){
  return true;  //tree is empty
 }
 else if(tree.left!=null && tree.right!=null){
  int lh = tree.height(tree.left);
  int rh = tree.height(tree.right);
  if(lh - rh > 1 || rh - lh > 1){
   return false;
  }
  tree.heightBalanced(tree.left);
  tree.heightBalanced(tree.right);
 }
 return true;
 }
You're ignoring the return values from the 2 recursive calls to heightBalanced. Perhaps you want to return true only if they both return true.
Jesse Rusak
Your method returns tree if tree.left != null and tree.right = 'a huge subtree'.Also calling height(..) and heightBalanced(...) on the same subtree is rather inefficient. I would look at returning enough information from heightBalanced to avoid computing the height of each subtree twice
Rob Walker
Replace your else if block with an else block. Set lh and rh to 0 if the tree is null. Replace `tree.heightBalanced(tree.left)` with `tree.left == null || tree.heightBalanced(tree.left)`. Do the same for tree.right.
Brian
You do have a method to determine height, but it (depending on how your tree works) might not be O(1). If it isn't, then using that method will make your algorithm asymptotically slower...though that might not matter for your purposes.
Brian
+2  A: 

What balanced means depends a bit on the structure at hand. For instance, A B-Tree cannot have nodes more than a certain depth from the root, or less for that matter, all data lives at a fixed depth from the root, but it can be out of balance if the distribution of leaves to leaves-but-one nodes is uneven. Skip-lists Have no notion at all of balance, relying instead on probability to achieve decent performance. Fibonacci trees purposefully fall out of balance, postponing the rebalance to achieve superior asymptotic performance in exchange for occasionally longer updates. AVL and Red-Black trees attach metadata to each node to attain a depth-balance invariant.

All of these structures and more are present in the standard libraries of most common programming systems (except python, RAGE!). Implementing one or two is good programming practice, but its probably not a good use of time to roll your own for production, unless your problem has some peculiar performance need not satisfied by any off-the-shelf collections.

TokenMacGuy
A: 

Wouldn't this work?

return ( ( Math.abs( size( root.left ) - size( root.right ) ) < 2 );

Any unbalanced tree would always fail this.

Sumit Kishore
Many balanced trees will fail it, too.
Brian
+16  A: 

Stumbled across this old question while searching for something else. I notice that you never did get a complete answer.

The way to solve this problem is to start by writing a specification for the function you are trying to write.

Specification: A well-formed binary tree is said to be "height-balanced" if (1) it is empty, or (2) its left and right children are height-balanced and the height of the left tree is within 1 of the height of the right tree.

Now that you have the specification, the code is trivial to write. Just follow the specification:

IsHeightBalanced(tree)
    return (tree is empty) or 
           (IsHeightBalanced(tree.left) and
            IsHeightBalanced(tree.right) and
            abs(Height(tree.left) - Height(tree.right)) <= 1)

Translating that into the programming language of your choice should be trivial.

Bonus exercise: this naive code sketch traverses the tree far too many times when computing the heights. Can you make it more efficient?

Super bonus exercise: suppose the tree is massively unbalanced. Like, a million nodes deep on one side and three deep on the other. Is there a scenario in which this algorithm blows the stack? Can you fix the implementation so that it never blows the stack, even when given a massively unbalanced tree?

Eric Lippert
+1 for only correct answer, I can't believe no one was able to answer this for 8 months...
BlueRaja - Danny Pflughoeft
Answer to the "exercises" below…
Potatoswatter
Answered Bonus exercise below.
Brian
A: 

Here is a version based on a generic depth-first traversal. Should be faster than the other correct answer and handle all the mentioned "challenges." Apologies for the style, I don't really know Java.

You can still make it much faster by returning early if max and min are both set and have a difference >1.

public boolean isBalanced( Node root ) {
    int curDepth = 0, maxLeaf = 0, minLeaf = INT_MAX;
    if ( root == null ) return true;
    while ( root != null ) {
        if ( root.left == null || root.right == null ) {
            maxLeaf = max( maxLeaf, curDepth );
            minLeaf = min( minLeaf, curDepth );
        }
        if ( root.left != null ) {
            curDepth += 1;
            root = root.left;
        } else {
            Node last = root;
            while ( root != null
             && ( root.right == null || root.right == last ) ) {
                curDepth -= 1;
                last = root;
                root = root.parent;
            }
            if ( root != null ) {
                curDepth += 1;
                root = root.right;
            }
        }
    }
    return ( maxLeaf - minLeaf <= 1 );
}
Potatoswatter
A nice attempt but it clearly does not work. Let x be a null node. Let a non-null tree node be denoted as (LEFT VALUE RIGHT). Consider the tree (x A (x B x)). "root" points to nodes A, B, A, B, A, B ... forever. Care to try again? A hint: it's actually easier *without* parent pointers.
Eric Lippert
@Eric: Oops, fixed (I think). Well, I'm trying to do this without O(depth) memory, and if the structure doesn't have parent pointers (it often does), you need to use a stack.
Potatoswatter
So what you're telling me is you'd rather use O(n) permanent memory in parent pointers to avoid allocating O(d) temporary memory, where log n <= d <= n ? This seems like a false economy.
Eric Lippert
Unfortunately, though you've fixed the problem with the traversal, there's a far bigger problem here. This does not test whether a tree is balanced, it tests whether a tree has all its leaves close to the same level. That's not the definition of "balanced" that I gave. Consider the tree ((((x D x) C x) B x) A x). Your code reports that this is "balanced" when it obviously is maximally unbalanced. Care to try again?
Eric Lippert
@Eric reply 1: not a false economy if you already use the parent pointers for something else. reply 2: sure, why not. This is a bizarre way of debugging… I shouldn't be blindly writing traversals of anything at 4 am…
Potatoswatter
Also, note that a height-balanced tree can have leaves whose depths differ by two. For example, ((((x H x) D x) B (x E x)) A (((x I x) F x) C ((x J x) G (x K (x L x))))) is height balanced, but the depth of leaf E is two less than the depth of L.
Eric Lippert
@Eric: There are various definitions of balance. A red-black tree's leaves can vary in depth by a factor of two. You can adjust the `return` expression accordingly.
Potatoswatter
Indeed, there are many definitions of balance. However, it's still not the case that you can work out whether a given tree is balanced according to my definition solely by locating the deepest and shallowest leaves. I can construct you height-balanced trees where the deepest and shallowest leaves are arbitrarily far apart, given enough nodes. Now, if you disagree with my claim that your definition of height-balanced is not the same as mine then I encourage you to attempt a formal proof that they are equivalent.
Eric Lippert
@Eric: Absolutely, they are not equivalent. Not all trees with max_leaf_depth <= 2 * min_leaf_depth are valid red-black trees! However, that's beyond the scope of the OP. I used the most stringent definition of balance, which happens to be universal. As for practicality, I confess I don't know the application of validating a tree's balance without simultaneously balancing it.
Potatoswatter
You can fix your implementation by tracking the depth of not the lowest and highest *leaves* but the lowest and highest *nodes that have any null child*. You'll still be testing for a different definition of "balanced" than the one I proposed; as you note, yours is a considerably more restricting definition. It is an interesting challenge to try and implement a tester for my more relaxed definition without consuming unbounded call stack space.
Eric Lippert
Also, the reason to validate a tree's balance without balancing it is to enable you to write a test suite that verifies that your tree balancer is correct!
Eric Lippert
@Eric: That is the change I applied last night. As for an AVL tester with O(1) memory (given parent links), just use this answer as a template for an O(1)-memory depth-first traversal, and perform a nested depth-first traversal at each node to find the heights of its subtrees. That will use O(1) memory (but O(N) parent links) and O(N^2) time.
Potatoswatter
@Eric: interestingly, the parent links may actually optimize something in that algorithm: the checking algorithm may be multithreaded and share a common set of parent links, potentially using *less* memory than a multithreaded version where each thread has a stack.
Potatoswatter
Ah, so you did. Clearly I did not read the change carefully enough!
Eric Lippert
@Eric: no prob ;v)
Potatoswatter
+1  A: 

Bonus exercise response. The simple solution. Obviously in a real implementation one might wrap this or something to avoid requiring the user to include height in their response.

IsHeightBalanced(tree, out height)
    if (tree is empty)
        height = 0
        return true
    balance = IsHeightBalanced(tree.left, heightleft) and IsHeightBalanced(tree.right, heightright)
    height = max(heightleft, heightright)+1
    return balance and abs(heightleft - heightright) <= 1     
Brian
Nicely done! -------
Eric Lippert
A: 

Here's what i have tried for Eric's bonus exercise. I try to unwind of my recursive loops and return as soon as I find a subtree to be not balanced.

int heightBalanced(node *root){
    int i = 1;
    heightBalancedRecursive(root, &i);
    return i; 
} 

int heightBalancedRecursive(node *root, int *i){

    int lb = 0, rb = 0;

    if(!root || ! *i)  // if node is null or a subtree is not height balanced
           return 0;  

    lb = heightBalancedRecursive(root -> left,i);

    if (!*i)         // subtree is not balanced. Skip traversing the tree anymore
        return 0;

    rb = heightBalancedRecursive(root -> right,i)

    if (abs(lb - rb) > 1)  // not balanced. Make i zero.
        *i = 0;

    return ( lb > rb ? lb +1 : rb + 1); // return the current height of the subtree
}
Sudharsanan
+2  A: 

Balance is a truly subtle property; you think you know what it is, but it's so easy to get wrong. In particular, even Eric Lippert's (good) answer is off. That's because the notion of height is not enough. You need to have the concept of minimum and maximum heights of a tree (where the minimum height is the least number of steps from the root to a leaf, and the maximum is... well, you get the picture). Given that, we can define balance to be:

A tree where the maximum height of any branch is no more than one more than the minimum height of any branch.

(This actually implies that the branches are themselves balanced; you can pick the same branch for both maximum and minimum.)

All you need to do to verify this property is a simple tree traversal keeping track of the current depth. The first time you backtrack, that gives you a baseline depth. Each time after that when you backtrack, you compare the new depth against the baseline

  • if it's equal to the baseline, then you just continue
  • if it is more than one different, the tree isn't balanced
  • if it is one off, then you now know the range for balance, and all subsequent depths (when you're about to backtrack) must be either the first or the second value.

In code:

class Tree {
    Tree left, right;
    static interface Observer {
        public void before();
        public void after();
        public boolean end();
    }
    static boolean traverse(Tree t, Observer o) {
        if (t == null) {
            return o.end();
        } else {
            o.before();
            try {
                if (traverse(left, o))
                    return traverse(right, o);
                return false;
            } finally {
                o.after();
            }
        }
    }
    boolean balanced() {
        final Integer[] heights = new Integer[2];
        return traverse(this, new Observer() {
            int h;
            public void before() { h++; }
            public void after() { h--; }
            public boolean end() {
                if (heights[0] == null) {
                    heights[0] = h;
                } else if (Math.abs(heights[0] - h) > 1) {
                    return false;
                } else if (heights[0] != h) {
                    if (heights[1] == null) {
                        heights[1] = h;
                    } else if (heights[1] != h) {
                        return false;
                    }
                }
                return true;
            }
        });
    }
}

I suppose you could do this without using the Observer pattern, but I find it easier to reason this way.


[EDIT]: Why you can't just take the height of each side. Consider this tree:

        /\
       /  \
      /    \
     /      \_____
    /\      /     \_
   /  \    /      / \
  /\   C  /\     /   \
 /  \    /  \   /\   /\
A    B  D    E F  G H  J

OK, a bit messy, but each side of the root is balanced: C is depth 2, A, B, D, E are depth 3, and F, G, H, J are depth 4. The height of the left branch is 2 (remember the height decreases as you traverse the branch), the height of the right branch is 3. Yet the overall tree is not balanced as there is a difference in height of 2 between C and F. You need a minimax specification (though the actual algorithm can be less complex as there should be only two permitted heights).

Donal Fellows
Ah, good point. You could have a tree which is h(LL)=4, h(LR)=3, h(RL)=3, h(RR)=2. Thus, h(L)=4 and h(R)=3, so it would appear balanced to the earlier algorithm, but with a max/min depth of 4/2, this is not balanced. This would probably make more sense with a picture.
Tim
That's what I've just added (with the world's nastiest ASCII graphic tree).
Donal Fellows
A: 

Here's a non recursive solution. Doesn't calculate any height.

int heightBalanced(node* root){

  int flag = 0 ; // tells how many children a node does NOT have. valid values - 0, 1, or 2. if value = 2, then leaf node
  int parentFlag = 0 ; // tells how many children the parent of the current node DOESNOT have. valid values 0 or 1

  node *n;
  Stack s;  // can have this allocated in heap also
  s.push(root);

  do{        // do a pre order traversal using stack 

       n = s.pop();
       if (n -> right) s.push( n -> right);
       else flag++;

       if( n -> left) s.push( n -> left);
       else flag ++;

       if(parentFlag && flag != 2) // parent node has only one child and that child is not leaf. height not balanced.
         return 0;
       parentFlag = (flag == 1 ? 1 : 0) ; // parent flag is set only if flag is 1
       flag = 0 ;

   } while (!s.empty);

   return 1;

}

Sudharsanan
ok..I just realized that this solution will not work if all nodes have 2 children but the leaves are at different levels! ignore it..
Sudharsanan