views:

225

answers:

4

Hey folks,

I've got a unidirectional tree of objects, in which each objects points to its parent. Given an object, I need to obtain its entire subtree of descendants, as a collection of objects. The objects are not actually in any data structure, but I can easily get a collection of all the objects.

The naive approach is to examine each object in the batch, see if the given object is an ancestor, and keep it aside. This would not be too efficient... It carries an overhead of O(N*N), where N is the number of objects.

Another approach is the recursive one, meaning search for the object's direct children and repeat the process for the next level. Unfortunately the tree is unidirectional... there's no direct approach to the children, and this would be only slightly less costly than the previous approach.

My question: Is there an efficient algorithm I'm overlooking here?

Thanks,

Yuval =8-)

A: 

Your question is a little abstract, but nested sets (scroll down, might be a little too mysql-specific) might be an option for you. It's extremely fast for read operations, though any modifications are quite complex (and have to modify half the tree on average).

That requires the ability to modify your data structure, though. And I guess if you can modify the structure, you could just as well add references to child objects. If you can't modify the structure, I doubt there's anything faster than your ideas.

MattW.
+3  A: 

Databases work the same way, so do what databases do. Build up a hashtable which maps from parent to list-of-children. That takes O(n). Then using that hashtable would make lookups and queries potentially be a lot more efficient.

Justice
More like O(n log n), but otherwise a good idea.
Nick Johnson
A: 

Building a tree where the objects point to their immediate children would probably be the best approach, especially if you need to do future look-ups. Building the tree largely depends on the height of the original tree. At maximum, it would take O(n^2).

While you're building the tree, build a hashtable. The hashtable will make future searches for a particular object faster (O(1) vs. O(n)).

R4Y
+2  A: 

As others have mentioned, build a hashtable/map of objects to a list of their (direct) children.

From there you can easily lookup a list of direct children of your "target object", and then for each object in the list, repeat the process.

Here's how I did it in Java and using generics, with a queue instead of any recursion:

public static Set<Node> findDescendants(List<Node> allNodes, Node thisNode) {

 // keep a map of Nodes to a List of that Node's direct children
 Map<Node, List<Node>> map = new HashMap<Node, List<Node>>();

 // populate the map - this is O(n) since we examine each and every node
 // in the list
 for (Node n : allNodes) {

  Node parent = n.getParent();
  if (parent != null) {

   List<Node> children = map.get(parent);
   if (children == null) {
    // instantiate list
    children = new ArrayList<Node>();
    map.put(parent, children);
   }
   children.add(n);
  }
 }


 // now, create a collection of thisNode's children (of all levels)
 Set<Node> allChildren = new HashSet<Node>();

 // keep a "queue" of nodes to look at
 List<Node> nodesToExamine = new ArrayList<Node>();
 nodesToExamine.add(thisNode);

 while (nodesToExamine.isEmpty() == false) {
  // pop a node off the queue
  Node node = nodesToExamine.remove(0);

  List<Node> children = map.get(node);
  if (children != null) {
   for (Node c : children) {
    allChildren.add(c);
    nodesToExamine.add(c);
   }
  }
 }

 return allChildren;
}

The expected execution time is something between O(n) and O(2n), if I remember how to calculate that right. You're guaranteed to look at every node in the list, plus a few more operations to find all of the descendants of your node - in the worst case (if you run the algorithm on the root node) you are looking at every node in the list twice.

matt b