views:

365

answers:

6

For a given binary tree, largest binary search sub-tree should be found.

Example:

Input:

                   10
               /         \
             50           150
            /  \         /   \
          25    75     200    20
         / \   / \    /  \    / \
        15 35 65 30  120 135 155 250 

Output:

                   50
                  /   \
                 25   75
                / \   /
               15 35  65
+1  A: 

A binary search tree will give you a sorted result if you do a IN-ORDER Traversal. So, do an in-order traversal for the entire binary tree. The longest sorted sequence is your largest binary search sub tree.

  • Do a inorder traversal of elements (VISIT LEFT, VISIT ROOT, VISIT RIGHT)
  • While doing so, get the node data, compare whether the previous node data is lesser than the next data. If so, increment counter by 1. Store the start node.
  • When the comparison fails, store the end node and reset counter to 0
  • Store this information (counter,start,end) node in an array structure to later find which is having the maximum value and that will give you the longest binary search sub tree
Bragboy
I might be missing something, but I don't think this works. Consider the example posted by the OP, specifically the subtree rooted at `50`. Consider replacing each value except `35` and `50` with something that completely breaks the BST property. An inorder traversal would find `35 50` in sorted order, which would mean nothing at all. Can you detail how you would avoid this?
IVlad
@IVlad, I do not understand.. How would 35 50 will give the result in sorted order if you break the BST property?? I guess you're trying me to point a big hole in my logic, please explain me.
Bragboy
@Bragboy - replace `75` with `20` in the subtree rooted at `50` in the OP's example. What does your algorithm output? Your algorithm is only correct if it doesn't consider `65`, otherwise it will output a disconnected tree. So how do you know to ignore `65`?
IVlad
+2  A: 

Interesting question!

My earlier attempt was moronically wrong!

Here is another attempt (hopefully correct this time).

I am assuming the tree is connected.

Suppose for each node n of the tree, you had a set of descendants of n, Sn with the property that

  • For each member x of Sn, the unique path from n to x is a Binary Search Tree (it is only a path, but you can still consider it a tree).

  • For every descendant y of x, such that the path from n to y is a BST, y is in Sn.

The set of nodes Sn, gives you the largest BST rooted at n.

We can construct Sn for each node by doing a depth first search on the tree, and passing in the path information (the path from root to the current node) and updating the sets of the nodes in the path by backtracking along the path.

When we visit a node, we walk up the path, and check if the BST property is satisfied for that segment of the path walked up so far. If so, we add the current node to the corresponding set of the node of the path we just walked to. We stop walking the path the moment the BST property is violated. Checking if the path segment we walked so far is a BST can be done in O(1) time, for a O(path_length) time total processing time, for each node.

At the end, each node will have its corresponding Sn populated. We can walk the tree now and pick the node with the largest value of Sn.

The time taken for this is the sum of depths of the nodes (in the worst case), and that is O(nlogn) in the average case (see Section 5.2.4 of http://www.toves.org/books/data/ch05-trees/index.html), but O(n^2) in the worst case.

Perhaps a cleverer way to update the sets will guarantee a reduction in the worst case time.

The pseudo-code could be something like:

static Tree void LargestBST(Tree t)
{
    LargestBST(t, new List<Pair>());
    // Walk the tree and return the largest subtree with max |S_n|.
}

static Tree LargestBST(Tree t, List<Pair> path)
{
    if (t == null) return;

    t.Set.Add(t.Value);

    int value = t.Value;
    int maxVal = value;
    int minVal = value;

    foreach (Pair p in path)
    {
        if (p.isRight)
        {
            if (minVal < p.node.Value)
            {
                break;
            }
        }

        if (!p.isRight)
        {
            if (maxVal > p.node.Value)
            {
                break;
            }
        }

        p.node.Set.Add(t.Value);

        if (p.node.Value <= minVal)
        {
            minVal = p.node.Value;
        }

        if (p.node.Value >= maxVal)
        {
            maxVal = p.node.Value;
        }
    }

    Pair pl = new Pair();
    pl.node = t;
    pl.isRight = false;

    path.Insert(0, pl);
    LargestBST(t.Left, path);

    path.RemoveAt(0);

    Pair pr = new Pair();
    pr.node = t;
    pr.isRight = true;

    path.Insert(0, pr);

    LargestBST(t.Right, path);

    path.RemoveAt(0);

}
Moron
We don't know that the all the elements of the left subtree are smaller than the root (same for right subtree being larger). For instance, in OP's example, if we replaced `35` with `60` then largest BST containing `25` has 3 elements and for `75` has 2 elements, but for `50` does not have 3+2+1=6 elements. A recursive solution similar to this will work, though, if we introduce upper/lower bounds for each subtree call.
BlueRaja - Danny Pflughoeft
@BlueRaja: You are right. Maybe I should have tried proving it! :-)
Moron
@BlueRaja: I have update the answer with a different approach.
Moron
Nice. It's more or less equivalent to the algorithm I just posted (which I wrote before reading this, I swear!). Or at least, they both are `O(n log n)` for balanced trees and `O(n^2)` worst-case for non-balanced, and both rely on (converses of) the same fact.
BlueRaja - Danny Pflughoeft
@BlueRaja: Not just balanced trees. If you take a random tree, the expected sum of depths is < 2H_n, where H_n is the nth harmonic number. So I suppose it is reasonable, just like quicksort :-).
Moron
Err.. I meant 2nH_n, not 2H_n.
Moron
A: 
GetLargestSortedBinarySubtree(thisNode, ref OverallBestTree)
    if thisNode == null
        Return null
    LeftLargest = GetLargestSortedBinarySubtree(thisNode.LeftNode, ref OverallBestTree)
    RightLargest = GetLargestSortedBinarySubtree(thisNode.RightNode, ref OverallBestTree)
    if LeftLargest.Max < thisNode.Value & RightLargest.Min > thisNode.Value
        currentBestTree = new BinaryTree(LeftLargest, thisNode.Value, RightLargest)
    else if LeftLargest.Max < thisNode.Value
        currentBestTree = new BinaryTree(LeftLargest, thisNode.Value, null)
    else if RightLargest.Min > thisNode.Value
        currentBestTree = new BinaryTree(null, thisNode.Value, RightLargest)
    else
        currentBestTree = new BinaryTree(null, thisNode.Value, null)
    if (currentBestTree.Size > OverallBestTree.Size)
        OverallBestTree = currentBestTree
    return currentBestTree

As BlueRaja pointed out, this algorithm is not correct.

It should really be called GetLargestSortedBinarySubtreeThatCanBeRecursivelyConstructedFromMaximalSortedSubtrees.

Jeffrey L Whitledge
Fails in the same case as my comment to M's solution (replace `35` with `60`), though for different reasons - the best BST containing the root may contain *part* of the best left-BST, but not necessarily all of it. This algorithm would give a tree of size 3 (rooted at `25`) rather than size 5 (rooted at `50`) for that case.
BlueRaja - Danny Pflughoeft
@BlueRaja - Ah, yes!
Jeffrey L Whitledge
+3  A: 

The previous algorithm (see revisions) was O(n^2) - we can generalize it to O(n log n) by noticing the facts that:

  1. If b is the root of the largest BST and b.left.value < b.value, then b.left is also in the BST (same for b.right.value ≥ b.value)
  2. If b is the root of the largest BST and a is also in the BST, then every node between a and b is in the BST.

So if c is between a and b, and c is not in the BST rooted by b, neither is a (due to (2.)). Using this fact, we can easily determine if a node is in the BST rooted by any given ancestor. We'll do this by passing a node into our function along with a list of its ancestors, and the associated min/maxValues that the current child-node would have to satisfy if indeed that ancestor were the root of the largest BST (we'll call this list ancestorList). We'll store the entire collection of potential-roots in overallRootsList

Let's define a structure called potentialRoot as follows:

Each potentialRoot contains the following values:
* node: The node we are considering for the root of the BST
* minValue and maxValue: the range another node must fall between to be part of the BST rooted by node (different for every node)
* subNodes: A list of the rest of the nodes in the largest BST rooted by node

The pseudo code looks like this (note that all lists mentioned are lists of potentialRoots):

FindLargestBST(node, ancestorList):
    leftList, rightList = empty lists
    for each potentialRoot in ancestorList:
        if potentialRoot.minValue < node.Value ≤ potentialRoot.maxValue:
            add node to potentialRoot.subNodes (due to (1.))
            (note that the following copies contain references, not copies, of subNodes)
            add copy of potentialRoot to leftList, setting maxValue = node.Value
            add copy of potentialRoot to rightList, setting minValue = node.Value

    add the potentialRoot (node, -∞, +∞) to leftList, rightList, and overallRootsList
    FindLargestBST(node.left, leftList)
    FindLargestBST(node.right, rightList)

At the end overallRootsList will be a list of n potentialRoots, each with a list of subNodes. The one with the largest subNodes list is your BST.

Since there are < treeHeight values in ancestorList, then (assuming the tree is balanced), the algorithm runs in O(n log n)

BlueRaja - Danny Pflughoeft
Isn't this `O(n^2)`? It works, but this (for me anyway, I'll admit it might help the OP) is pretty trivial, something in `O(n)` would be a lot nicer :). +1 because it does answer the question however.
IVlad
@IVlad: Yes, this is `O(n^2)`, but is the only method posted so far that works. If you find something better, please post it!
BlueRaja - Danny Pflughoeft
The asymtotic complexity of this algorithm is outragous! But +1 anyway for being *correct*. :-) I can't imagine any way to make this problem tractable (with the possible subtrees ∝ 2^N ).
Jeffrey L Whitledge
@IVlad, @BlueRaja - I think this is much worse than n^2. I think this problem is exponential. Each additional leaf node will double the problem size (since it's the target of a recursive call within a recursive call). The details are complicated (and I am not knowledgeable to calculate the exact values), but a clear minimum complexity is based on the fact that each leaf node is either in or out of the solution, thus 2^m possibilities for m leaf nodes.
Jeffrey L Whitledge
@Jeffrey - I don't think so. Notice that there are `n` recursive calls to `LargestBSTIncludingNode` in the `LargestBST` function (one for each node). Then in `LargestBSTIncludingNode`, each node is visited once only. This looks `O(n^2)` to me.
IVlad
@IVlad - OK. Perhaps you're right. I guess there is some divide-and-conquer going on.
Jeffrey L Whitledge
@Jeffrey - it's pretty easy to test, just run it on a random (ideally complete, but this is pretty hard to generate randomly, so just run the test multiple times and don't bother) binary search tree with > 100 nodes. If it takes forever you're right, otherwise you're not. This is fairly accurate because there's definitely no input-based pruning going on for when the tree is a BST: if the input is a BST, that's the worst case.
IVlad
@IVlad - Nah, I prefer theory over experimentation, which is why I believe that heavier objects fall faster than lighter ones. :-D
Jeffrey L Whitledge
@IVlad: See my edit, and also @M's edited post below
BlueRaja - Danny Pflughoeft
@BlueRaja - One thing I don't get: you say at the end you will have a list of `n` potential roots, but then you say there are `< treeHeight` values in `listOfPotentialRoots`. Which one is it? because these two statements sort of contradict each other.
IVlad
@IVlad: I meant to imply that you keep a list of every potentialRoot created along the way. I've edited to make that more clear.
BlueRaja - Danny Pflughoeft
A: 
root(Tree L A R) = A

MaxBST(NULL) = (true, 0, NULL)
MaxBST(Tree L A R as T) = 
  let
    # Look at both children
    (L_is_BST, L_size, L_sub) = MaxBST(L)
    (R_is_BST, R_size, R_sub) = MaxBST(R)
  in
  # If they're both good, then this node might be good too
  if L_is_BST and R_is_BST and (L == NULL or root(L) < A) and (R == NULL or A < root(R))
  then (true, 1 + L_size + R_size, T)
  else
       # This node is no good, so give back the best our children had to offer
       (false, max(L_size, R_size), if L_size > R_size then L_sub else R_sub)

Looks at each tree node exactly once, so runs in O(N).

Edit: Crud, this doesn't consider that it can leave out some parts of a subtree. When I read subtree, I assumed "the entire tree rooted at some node". I may come back to fix this later.

Novelocrat
+4  A: 

This answer previously contained an O(n log n) algorithm based on link/cut trees. Here is a simpler O(n) solution.

The core is a procedure that accepts a node, the unique maximum BSST rooted at its left child, the unique maximum BSST rooted at its right child, and pointers to the left-most and right-most elements of these BSSTs. It destroys its inputs (avoidable with persistent data structures) and constructs the unique maximum BSST rooted at the given node, together with its minimum and maximum elements. All BSST nodes are annotated with the number of descendants. As before, this procedure is called repeatedly from a post-order traversal. To recover the sub-tree, remember the root of the largest BSST; reconstructing it requires only a simple traversal.

I'll treat the left BSST only; the right is symmetric. If the root of the left BSST is greater than the new root, then the entire sub-tree is removed, and the new root is now left-most. Otherwise, the old left-most node is still left-most. Starting from the right-most node of the left BSST and moving upward, find the first node that is less than or equal to the root. Its right child must be removed; note now that due to the BST property, no other nodes need to go! Proceed to the root of the left BSST, updating the counts to reflect the deletion.

The reason this is O(n) is that in spite of the loop, each edge in the original tree is in essence traversed only once.


EDIT: collectively, the paths traversed are the maximal straight-line paths in a BST, except for the left spine and the right spine. For example, on the input

              H
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       /             \
      D               L
     / \             / \
    /   \           /   \
   /     \         /     \
  B       F       J       N
 / \     / \     / \     / \
A   C   E   G   I   K   M   O

here are the recursive calls on which each edge is traversed:

              H
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       /             \
      D               L
     / h             h \
    /   h           h   \
   /     h         h     \
  B       F       J       N
 / d     d h     h l     l \
A   C   E   G   I   K   M   O
Link to paper on link/cut trees: http://www.cs.cmu.edu/~sleator/papers/dynamic-trees.pdf
BlueRaja - Danny Pflughoeft
Doesn't 'proceed to root updating counts' mean we traverse edges more than once?
Moron
+1. I just realized that you don't have to update the counts along the intermediate nodes along the path to root (basically I misunderstood what you wrote). This looks right to me.
Moron
Looks correct to me as well @throwawayacct - I hope you will continue to share your knowledge, rather than really throwing away your account :)
BlueRaja - Danny Pflughoeft