tags:

views:

269

answers:

10

I iterate through a tree structure to collect the paths of the leaf nodes. Which way do you prefer to collect the result of the operation:

a) merge the results of the children and return this

private Collection<String> extractPaths(final Element element, final IPath parentPath) {
    final IPath path = parentPath.append(element.getLabel());
    final Collection<Element> children = getElementChildren(element);
    if (children.isEmpty())
        return Collections.singletonList(path.toString());

    final Set<String> result = new TreeSet<String>();
    for (final Element child : children)
        result.addAll(extractPaths(child, path));
    return result;
}

b) provide the result collection as a parameter and add new elements in each recursion step

private void extractPaths(final Element element, final IPath parentPath, final Set<String> result) {
    final IPath path = parentPath.append(element.getLabel());
    final Collection<Element> children = getElementChildren(element);
    if (children.isEmpty())
        result.add(path.toString());

    for (final Element child : children)
       extractPaths(child, path, result);
}
+5  A: 

Both can be used without any problems. Though, former solution is more clean since it doesn't change input parameters. No side effects is in the nature of functional programming.

Boris Pavlović
No side effects apart from the overhead of creating and destroying lots of collections, and copying data between collections at each stage. Your advice is good when referring to a functional language. Java ... isn't.
JeeBee
Agree that Java isn't functional language but recursion is at heart of functional style of programming. I know, it can be used in imperative programming...
Boris Pavlović
+1  A: 

I usually prefer to return the result, since i think

$result = extractPaths($arg,$arg2);

is more clear than

extractPaths($arg,$arg2,$result);

but it's entirely based on taste.

Silfverstrom
+1  A: 

I would choose option b, since it would create fewer objects and thereby be more efficient. Solution a feels more like the way you would do it in a functional language, but that relies on assumptions that don't hold in Java.

tomjen
+1  A: 

If you pass in the object to be built, if you had an exception that you caught in a place where you had a reference to that object, then you would at least have the data you built up until the exception was thrown.

I personally pass in Builders as arguments when multiple methods will be "building" on it, including recursion. This way you only have a single object being built, and miss out lots of Set, Map or List copying.

JeeBee
+1  A: 

in this specific case I prefer the latter solution since:

  1. it avoids creating throw-away collections
  2. your algorithm implemented in this way cannot get any gain from being "functional"

imho there is no real benefit of being functional without a really good reason f (e.g. using threads).

dfa
no reason other than clear readability and being able to reason with the code better?
Chii
readability is subjective no?
dfa
+6  A: 

I assume the latter is meant to call extractPaths(child, path, result)?

The latter form will be more efficient, as it doesn't need to copy items at every level of recursion. As Boris says, it's less functionally clean - but Java doesn't really provide immutable collections with appropriate methods to create new collections based on them efficiently.

In terms of making it pleasant to call, you could provide a wrapper in the style of the first option which just creates a new set and calls the second option. That's probably what I'd do:

private Collection<String> extractPaths(Element element, IPath parentPath) {
    Set<String> ret = new HashSet<String>();
    extractPaths(element, parentPath, ret);
    return ret;
}

Another alternative is to change the third parameter from a Set<String> to some sort of "collector" interface: you tell it that you've found a result, without specifying what to do with it. Indeed, the collector could return a new collector to use from then on - leaving it up to the implementation to decide whether to make a functionally-clean "create a new set" version, or hide side-effects in the collector which would just return itself again for reuse.

Jon Skeet
+3  A: 

To provide the most convenient and flexible interface to your clients, write it as a class that implements Iterator<E>.

This means that the client can loop through the items found during the recursion, but they don't have to implement their "for each" code as a callback (Java doesn't have a pretty way to do that), and they can even "pause" the operation and continue it later, outside of the scope in which they began it (or abandon it at any point).

It's the trickiest to implement though. If the data structure you're traversing is a tree-like structure with parent pointers in each node then you need no other data than the current node. To get to the next node, look for a first child. If there is one, that's the next node. Otherwise try the next sibling. If there isn't one, get the parent and try to get its next sibling, and so on until you hit a null in which case there are no more items.

As a quick and dirty example, here's a treenode-like class, breaking all the rules about encapsulation to save some space here:

class SimpleNode
{
    String name;
    public SimpleNode parent, firstChild, nextSibling;

    public SimpleNode(String n) { name = n; }

    public void add(SimpleNode c)
    {
        c.parent = this;
        c.nextSibling = firstChild;
        firstChild = c;
    }

    public String getIndent()
    {
        StringBuffer i = new StringBuffer();

        for (SimpleNode n = this; n != null; n = n.parent)
            i.append("  ");

        return i.toString();
    }
}

Now let's create a tree from it:

SimpleNode root = new SimpleNode("root");

SimpleNode fruit = new SimpleNode("fruit");
root.add(fruit);
fruit.add(new SimpleNode("pear"));
fruit.add(new SimpleNode("banana"));
fruit.add(new SimpleNode("apple"));

SimpleNode companies = new SimpleNode("companies");
root.add(companies);
companies.add(new SimpleNode("apple"));
companies.add(new SimpleNode("sun"));
companies.add(new SimpleNode("microsoft"));

SimpleNode colours = new SimpleNode("colours");
root.add(colours);
colours.add(new SimpleNode("orange"));
colours.add(new SimpleNode("red"));
colours.add(new SimpleNode("blue"));

Now, to spell this out for anyone new to this idea, what we want to be able to do is this:

for (final SimpleNode n : new SimpleNodeIterator(root))
    System.out.println(n.getIndent() + "- " + n.name);

And get this (I've made the above code generate something that looks like a hierarchical bullet list in SO):

  • root
    • colours
      • blue
      • red
      • orange
    • companies
      • microsoft
      • sun
      • apple
    • fruit
      • apple
      • banana
      • pear

To do this, we have to map some standard operations onto our SimpleNode class:

class SimpleNodeIterator extends TreeIterator<SimpleNode>
{
    public SimpleNodeIterator(SimpleNode root)
        { super(root); }

    protected SimpleNode getFirstChild(SimpleNode of)
        { return of.firstChild; }

    protected SimpleNode getNextSibling(SimpleNode of)
        { return of.nextSibling; }

    protected SimpleNode getParent(SimpleNode of)
        { return of.parent; }
}

And finally, at the bottom of our design, TreeIterator<TNode> is a very reusable abstract base class that does the rest, now we've told it how to navigate our node class:

abstract class TreeIterator<TNode> implements Iterator<TNode>,
                                              Iterable<TNode>
{
    private TNode _next;

    protected TreeIterator(TNode root)
        { _next = root; }

    public Iterator<TNode> iterator()
        { return this; }

    public void remove()
        { throw new UnsupportedOperationException(); }

    public boolean hasNext()
        { return (_next != null); }

    public TNode next()
    {
        if (_next == null)
            throw new NoSuchElementException();

        TNode current = _next;

        _next = getFirstChild(current);

        for (TNode ancestor = current;
             (ancestor != null) && (_next == null);
             ancestor = getParent(ancestor))
        {
            _next = getNextSibling(ancestor);
        }

        return current;
    }

    protected abstract TNode getFirstChild(TNode of);
    protected abstract TNode getNextSibling(TNode of);
    protected abstract TNode getParent(TNode of);
}

(It's mildly naughty in that it implements Iterator<E> and Iterable<E> on the same object. This just means that you have to new up a fresh object in order to iterate a second time; don't try to reuse the same object).

This means that if your hierarchical structure consists of nodes for which you can define those three simple navigational operations, then all you have to do is derive your own equivalent of SimpleNodeIterator. This makes it very easy to enable this capability on any tree implementation.

If what you're iterating doesn't have a way to get the parent, you need to keep a stack during the iteration. Each time you descend a level, you push the state for the current level onto the stack. When you finish iterating at the current level, you pop the last state off the stack and continue with it. When the stack is empty, you're done. This means you have some intermediate storage, but its maximum size is proportional to the depth of the recursion rather than the number of items, so assuming the data is roughly balanced then it should be a lot more storage-efficient than copying all the items to a list before you return it.

Daniel Earwicker
This is an interesting approach. Although I think it is harder to understand what is going on since it adds more complexity. I need this traversal only at one point in the application so I would prefer a simple method call for clarity.
ftl
I'd recommend defining the pattern once in an abstract class so you can reuse it with less effort. I'll post an exmaple of what I mean if I have time later.
Daniel Earwicker
Added implementation for situations where no additional stack is needed.
Daniel Earwicker
+1, great stuff.
JoshJordan
+1  A: 

pass a collection as parameter for this method

Villa
A: 

Later will create less objects in memory (as already said) but also manages each tree path only once: when extracted and stored in the Set result it is not 'addedAll' to any other set again and again and again.

ignasi35
A: 

The final solution I found after some refactoring is to implement variant b) but to pass a Visitor instead of the result collection:

private void traverse(final Element element, final Visitor... visitors) {
   for (final Visitor visitor : visitors)
       // push e.g. the parent path to the stack
       visitor.push(visitor.visit(element)); 

   for (final Element child: getElementChildren(element))
       traverse(child, visitors);

   for (final Visitor visitor : visitors)
       visitor.pop();
}

The Visitor provides also a stack to carry the information about the parent path. This solution allows me to separate the traversal logic from the collection logic, without the need of the more complex TreeIterator implementation.

private class CollectPathsVisitor extends ElementVisitor {
    public final Set<String> paths = new TreeSet<String>();
    public Object visit(Element element) {
        final IPath parentPath = (IPath) peek();
        final IPath path = parentPath.append(element.getLabel());
        if (!hasChildren(element))
            paths.add(path);
        return path;
    }
}
ftl