views:

83

answers:

4

I have multiple binary trees stored as an array. In each slot is either nil (or null; pick your language) or a fixed tuple storing two numbers: the indices of the two "children". No node will have only one child -- it's either none or two.

Think of each slot as a binary node that only stores pointers to its children, and no inherent value.

Take this system of binary trees:

    0       1
   / \     / \
  2   3   4   5
 / \         / \
6   7       8   9
   / \
 10   11

The associated array would be:

   0       1       2      3     4      5      6      7        8     9     10    11
[ [2,3] , [4,5] , [6,7] , nil , nil , [8,9] , nil , [10,11] , nil , nil , nil , nil ]

I've already written simple functions to find direct parents of nodes (simply by searching from the front until there is a node that contains the child)

Furthermore, let us say that at relevant times, both all trees are anywhere between a few to a few thousand levels deep.

I'd like to find a function

P(m,n)

to find the lowest common ancestor of m and n -- to put more formally, the LCA is defined as the "lowest", or deepest node in which have m and n as descendants (children, or children of children, etc.). If there is none, a nil would be a valid return.

Some examples, given our given tree:

P( 6,11)   # => 2
P( 3,10)   # => 0
P( 8, 6)   # => nil
P( 2,11)   # => 2

The main method I've been able to find is one that uses an Euler trace, which turns the given tree (Adding node A as the invisible parent of 0 and 1, with a "value" of -1), into:

A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A

And from that, simply find the node between your given m and n that has the lowest number; For example, to find P(6,11), look for a 6 and an 11 on the trace. The number between them that is the lowest is 2, and that's your answer. If A (-1) is in between them, return nil.

 -- Calculating P(6,11) --

A-0-2-6-2-7-10-7-11-7-2-0-3-0-A-1-4-1-5-8-5-9-5-1-A
      ^ ^        ^
      | |        |
      m lowest   n

Unfortunately, I do believe that finding the Euler trace of a tree that can be several thousands of levels deep is a bit machine-taxing...and because my tree is constantly being changed throughout the course of the programming, every time I wanted to find the LCA, I'd have to re-calculate the Euler trace and hold it in memory every time.

Is there a more memory efficient way, given the framework I'm using? One that maybe iterates upwards? One way I could think of would be the "count" the generation/depth of both nodes, and climb the lowest node until it matched the depth of the highest, and increment both until they find someone similar.

But that'd involve climbing up from level, say, 3025, back to 0, twice, to count the generation, and using a terribly inefficient climbing-up algorithm in the first place, and then re-climbing back up.

Are there any other better ways?


Clarifications

In the way this system is built, every child will have a number greater than their parents.

This does not guarantee that if n is in generation X, there are no nodes in generation (X-1) that are greater than n. For example:

        0
       / \
      /   \
     /     \
    1       2        6
   / \     / \      / \
  2   3   9  10    7   8
     / \              / \
    4   5            11 12

is a valid tree system.

Also, an artifact of the way the trees are built are that the two immediate children of the same parent will always be consecutively numbered.

+1  A: 

Are the nodes in order like in your example where the children have a larger id than the parent? If so, you might be able to do something similar to a merge sort to find them.. for your example, the parent tree of 6 and 11 are:

6  -> 2 -> 0
11 -> 7 -> 2 -> 0

So perhaps the algorithm would be:

left = left_start
right = right_start

while left > 0 and right > 0
    if left = right
        return left
    else if left > right
        left = parent(left)
    else
        right = parent(right)

Which would run as:

left    right
----    -----
 6        11    (right -> 7)
 6        7     (right -> 2)
 6        2     (left -> 2)
 2        2     (return 2)

Is this correct?

FryGuy
This works...extremely well, and psuedo-linearly to the distance the two nodes are away from eachother, instead of to the size of the tree.It's pretty elegant, I do believe. Someone else just posted this solution, but you did it first. Thanks.I just...can't believe how strikingly simple this is.
Justin L.
+1  A: 

Maybe this will help: Dynamic LCA Queries on Trees.

Abstract:

Richard Cole, Ramesh Hariharan

We show how to maintain a data structure on trees which allows for the following operations, all in worst-case constant time. 1. Insertion of leaves and internal nodes. 2. Deletion of leaves. 3. Deletion of internal nodes with only one child. 4. Determining the Least Common Ancestor of any two nodes.

Conference: Symposium on Discrete Algorithms - SODA 1999

Moron
Thanks for the paper; I'll keep it in hand...
Justin L.
A: 

I've solved your problem in Haskell. Assuming you know the roots of the forest, the solution takes time linear in the size of the forest and constant additional memory. You can find the full code at http://pastebin.com/ha4gqU0n.

The solution is recursive, and the main idea is that you can call a function on a subtree which returns one of four results:

  • The subtree contains neither m nor n.
  • The subtree contains m but not n.
  • The subtree contains n but not m.
  • The subtree contains both m and n, and the index of their least common ancestor is k.

A node without children may contain m, n, or neither, and you simply return the appropriate result.

If a node with index k has two children, you combine the results as follows:

join :: Int -> Result -> Result -> Result
join _ (HasBoth k) _ = HasBoth k
join _ _ (HasBoth k) = HasBoth k
join _ HasNeither r = r
join _ r HasNeither = r
join k HasLeft HasRight = HasBoth k
join k HasRight HasLeft = HasBoth k

After computing this result you have to check the index k of the node itself; if k is equal to m or n, you will "extend" the result of the join operation.

My code uses algebraic data types, but I've been careful to assume you need only the following operations:

  • Get the index of a node
  • Find out if a node is empty, and if not, find its two children

Since your question is language-agnostic I hope you'll be able to adapt my solution.

There are various performance tweaks you could put in. For example, if you find a root that has exactly one of the two nodes m and n, you can quit right away, because you know there's no common ancestor. Also, if you look at one subtree and it has the common ancestor, you can ignore the other subtree (that one I get for free using lazy evaluation).

Your question was primarily about how to save memory. If a linear-time solution is too slow, you'll probably need an auxiliary data structure. Space-for-time tradeoffs are the bane of our existence.

Norman Ramsey
Despite picking the other answer, I still found this one vey intriguing; I'll keep this in mind when I apply this to a more generalized data structure.
Justin L.
A: 

I think that you can simply loop backwards through the array, always replacing the higher of the two indices by its parent, until they are either equal or no further parent is found:

(defun lowest-common-ancestor (array node-index-1 node-index-2)
  (cond ((or (null node-index-1)
             (null node-index-2))
         nil)
        ((= node-index-1 node-index-2)
         node-index-1)
        ((< node-index-1 node-index-2)
         (lowest-common-ancestor array
                                 node-index-1
                                 (find-parent array node-index-2)))
        (t
         (lowest-common-ancestor array
                                 (find-parent array node-index-1)
                                 node-index-2))))
Svante
This works strikingly well, but someone said it first =/
Justin L.
Yeah, I overlooked the other one (which left out a base case, anyway).
Svante