views:

77

answers:

3

I'm writing a function which generates all paths in a tree as xpath statements and storing them in a bag below is a naive (sorry this is long) and below that is my attempt to optimize it:

/**
 * Create the structural fingerprint of a tree. Defined as the multiset of
 * all paths and their multiplicities
 */
protected Multiset<String> createSF(AbstractTree<String> t,
  List<AbstractTree<String>> allSiblings) {
 /*
  * difference between unordered and ordered trees is that the
  * next-sibling axis must also be used
  * 
  * this means that each node's children are liable to be generated more
  * than once and so are memo-ised and reused
  */

 Multiset<String> res = new Multiset<String>();

  // so, we return a set containing:
  // 1. the node name itself, prepended by root symbol

 res.add("/" + t.getNodeName());
 List<AbstractTree<String>> children = t.getChildren();

 // all of the childrens' sets prepended by this one

 if (children != null) {

  for (AbstractTree<String> child : children) {

   Multiset<String> sub = createSF(child, children);

   for (String nextOne : sub) {
    if (nextOne.indexOf("//") == 0) {
     res.add(nextOne);
    } else {
     res.add("/" + nextOne);
     res.add("/" + t.getNodeName() + nextOne);
    }
   }
  }
 }

 // 2. all of the following siblings' sets, prepended by this one

 if (allSiblings != null) {

   // node is neither original root nor leaf 
   // first, find current node

  int currentNodePos = 0;
  int ptrPos = 0;

  for (AbstractTree<String> node : allSiblings) {
   if (node == t) {
    currentNodePos = ptrPos;
   }
   ptrPos++;
  }

   // 3. then add all paths deriving from (all) following siblings 

  for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
   AbstractTree<String> sibling = allSiblings.get(i);

   Multiset<String> sub = createSF(sibling, allSiblings);

   for (String nextOne : sub) {
    if (nextOne.indexOf("//") == 0) {
     res.add(nextOne);
    } else {
     res.add("/" + nextOne);
     res.add("/" + t.getNodeName() + nextOne);
    }
   }
  }
 }
 return res;
}

And now the optimization which is (currently) in a subclass:

private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();

public Multiset<String> createSF(AbstractTree<String> t,
  List<AbstractTree<String>> allSiblings) {

 Multiset<String> lookup = lookupTable.get(t);
 if (lookup != null) {
  return lookup;
 } else {

  Multiset<String> res = super.createSF(t, allSiblings);

  lookupTable.put(t, res);
  return res;
 }
}

My trouble is that the optimized version runs out of heap space (the vm args are set at -Xms2g -Xmx2g) and is very slow on moderately large input. Can anyone see a way to improve on this?

+1  A: 

Run the code through a profiler. That's the only way to get real facts about the code. Everything else is just guesswork.

Joachim Sauer
+1  A: 

"generates all paths in a tree as xpath statements"

How many paths are you creating? This can be non-trivial. The number of paths should be O( n log n ), but the algorithm could be much worse depending on what representation they use for children of a parent.

You should profile the simple enumeration of paths without worrying about the bag storage.

S.Lott
A: 

Your code eats RAM exponentially. So one layer more means children.size() times more RAM.

Try to use a generator instead of materializing the results: Implement a Multiset which does not calculate the results beforehand but iterates through the tree structure as you call next() on the set's iterator.

Aaron Digulla