views:

145

answers:

5

I am reading through the binary tree delete node algorithm used in the book Data Structures and Algorithms: Annotated Reference with Examples

on page 34 , case 4(Delete node which has both right and left sub trees), following algorithm described in the book looks doesn't work, probably I may be wrong could someone help me what I am missing.

//Case 4
get largestValue from nodeToRemove.Left
FindParent(largestValue).Right <- 0
nodeToRemove.Value<-largestValue.Value

How does the following line deletes the largest value from sub tree FindParent(largestValue).Right <- 0

+1  A: 

The idea is to simply take the value from the largest node on the left hand side and move it to the node that is being deleted, i.e., don't delete the node at all, just replace it's contents. Then you prune out the node with the value you moved into the "deleted" node. This maintains the tree ordering with every node's value larger than all of it's left children and smaller than all of it's right children.

tvanfosson
What if the left subtree has only one node? Won't `FindParent(largestValue).Right <- 0` delete the wrong node?
Passionate programmer
@Passionate programmer -- you'd certainly have to ensure that the parent is not the same as the node being removed. Hopefully they mentioned this in the text -- I don't have a copy so I don't know.
tvanfosson
+1  A: 

If I understand the pseudo-code, it works in the general case, but fails in the "one node in the left subtree" case. Nice catch.

It effectively replaces the node_to_remove with largest_value from it's left subtree (also nulls the old largest_value node).

Note that in a BST, the left subtree of node_to_remove will be all be smaller than node_to_remove. The right subtree of node_to_remove will all be larger than node_to_remove. So if you take the largest node in the left subtree, it will preserve the invariant.

If this is a "one node in the subtree case", it'll destroy the right subtree instead. Lame :(

As Vivin points out, it also fails to reattach left children of largestNode.

Stephen
+3  A: 

When deleting a node with two children, you can either choose its in-order successor node or its in-order predecessor node. In this case it's finding the the largest value in the left sub-tree (meaning the right-most child of its left sub-tree), which means that it is finding the node's in-order predecessor node.

Once you find the replacement node, you don't actually delete the node to be deleted. Instead you take the value from the successor node and store that value in the node you want to delete. Then, you delete the successor node. In doing so you preserve the binary search-tree property since you can be sure that the node you selected will have a value that is lower than the values of all the children in the original node's left sub-tree, and greater that than the values of all the children in the original node's right sub-tree.

EDIT

After reading your question a little more, I think I have found the problem.

Typically what you have in addition to the delete function is a replace function that replaces the node in question. I think you need to change this line of code:

FindParent(largestValue).Right <- 0

to:

FindParent(largestValue).Right <- largestValue.Left

If the largestValue node doesn't have a left child, you simply get null or 0. If it does have a left child, that child becomes a replacement for the largestValue node. So you're right; the code doesn't take into account the scenario that the largestValue node might have a left child.

Another EDIT

Since you've only posted a snippet, I'm not sure what the context of the code is. But the snippet as posted does seem to have the problem you suggest (replacing the wrong node). Usually, there are three cases, but I notice that the comment in your snippet says //Case 4 (so maybe there is some other context).

Earlier, I alluded to the fact that delete usually comes with a replace. So if you find the largestValue node, you delete it according to the two simple cases (node with no children, and node with one child). So if you're looking at pseudo-code to delete a node with two children, this is what you'll do:

get largestValue from nodeToRemove.Left
nodeToRemove.Value <- largestValue.Value

//now replace largestValue with largestValue.Left    

if largestValue = largestValue.Parent.Left then   
   largestValue.Parent.Left <- largestValue.Left //is largestValue a left child?
else //largestValue must be a right child
   largestValue.Parent.Right <- largestValue.Left

if largestValue.Left is not null then
   largestValue.Left.Parent <- largestValue.Parent

I find it strange that a Data Structures And Algorithms book would leave out this part, so I am inclined to think that the book has further split up the deletion into a few more cases (since there are three standard cases) to make it easier to understand.

To prove that the above code works, consider the following tree:

  8
 / \
7   9

Let's say that you want to delete 8. You try to find largestValue from nodeToRemove.Left. This gives you 7 since the left sub-tree only has one child.

Then you do:

nodeToRemove.Value <- largestValue.Value

Which means:

8.value <- 7.Value

or

8.Value <- 7

So now your tree looks like this:

  7
 / \
7   9

You need to get rid of the replacement node and so you're going to replace largestValue with largestValue.Left (which is null). So first you find out what kind of child 7 is:

if largestValue = largestValue.Parent.Left then

Which means:

if 7 = 7.Parent.Left then

or:

if 7 = 8.Left then

Since 7 is 8's left child, need to replace 8.Left with 7.Right (largestValue.Parent.Left <- largestValue.Left). Since 7 has no children, 7.Left is null. So largestValue.Parent.Left gets assigned to null (which effectively removes its left child). So this means that you end up with the following tree:

  7
   \
    9
Vivin Paliath
I think your solution still doesn't handle the case where largestValue is the same as nodeToRemove.Left
Kathy Van Stone
@Kathy yup, I traced out the code and came to the same conclusion. As it stands, the code in the question replaces the wrong node. I've updated my solution. The code in the question might be addressing a specific case.
Vivin Paliath
A: 

It may make more sense when you look at the Wikipedia's take on that part of the algorithm:

Deleting a node with two children: Call the node to be deleted "N". Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, "R". Replace the value of N with the value of R, then delete R. (Note: R itself has up to one child.)

Note that the given algorithm chooses the in-order predecessor node.

Edit: what appears to be missing the possibility that R (to use Wikipedia's terminology) has one child. A recursive delete might work better.

Kathy Van Stone
What if the left subtree has only one node? Won't FindParent(largestValue).Right <- 0 delete the wrong node?
Passionate programmer
@Passionate programmer check out the edit to my solution.
Vivin Paliath
@Vivin The discussion on what appears to be missing refers the original problem, which did assume that the equivalent to R has no children. I wasn't referring to problems in Wikipedia's solution
Kathy Van Stone
@Passionate programmer One reason why I suggest the (implicit) recursive delete call is that it checks all the possibilities, including the left subtree being only one node (which would then just be deleted). The recursive call would never encounter this case, so it should return quickly.
Kathy Van Stone
@Kathy - my mistake :) I realized that later after I read the question again. Deleting my original comment.
Vivin Paliath
+1  A: 

I think you may need to clarify what doesn't work.

I will try and explain the concept of deletion in a binary tree in case this helps.

Lets assume that you have a node in the tree that has two child nodes that you wish to delete. in the tree below lets say that you want to delete node b
           a
         /     \
       b       c
     /   \     /  \
   d     e   f     g

When we delete a node we need to reattach its dependant nodes.

ie. When we delete b we need to reattach nodes d and e.

We know that the left nodes are less than the right nodes in value and that the parent nodes are between the left and right node s in value. In this case d < b and b < e. This is part of the definition of a binary tree.

What is slightly less obvious is that e < a. So this means that we can replace b with e. Now we have reattached e we need to reattach d.

As stated before d < e so we can attach e as the left node of e.

The deletion is now complete.

( btw The process of moving a node up the tree and rearranging the dependant nodes in this fashion is known as promoting a node. You can also promote a node without deleting other nodes.)


           a
         /     \
       d       c
         \     /  \
          e   f     g

Note that there is another perfectly legitimate outcome of deleteing node b. If we chose to promote node d instead of node e the tree would look like this.


           a
         /     \
       e       c
     /        /  \
   d         f     g

developer