ansaurus

Question

Answer 1

A:

Where does jvisualvm indicate that the performance bottleneck is?

Thorbjørn Ravn Andersen 2009-07-24 12:07:33

I don't know how to use jvisualvm, but I have timed the methods, using a 100MB XML tree. making paths... Done [3614ms]creating hash codes... Done [962ms] Total Done [4576ms]

Robert 2009-07-24 12:13:04

It won't identify the core problem in this case, but you really should learn how to use a profiler such as visualvm. It's the only professional way to attack performance problems.

Michael Borgwardt 2009-07-24 12:24:20

I will strongly recommend learning how to use a profiler. The lowest hanging fruit there is jvisualvm.

Thorbjørn Ravn Andersen 2009-07-24 12:32:07

@Michael, a profiler would indicate that there is no single bottleneck, so it is most likely the algorithm :)

Thorbjørn Ravn Andersen 2009-07-24 12:32:49

Answer 2

+1 A:

I think your main problem is the amount of duplicate data you are producing: for every single leaf of the tree, you will make a copy of the entire path leading up to that leaf and compute the hash for that path. i.e. if you have 50,000 leaves under one top-level node, then the path name of that node will be copied 50,000 times and its hash computed 50,000 times.

If you could organize your data so that shared path prefixes are reused as references between leaves and hash calculations for these prefixes are cached and reused, you could drastically reduce the actual amount of work to be done.

Michael Borgwardt 2009-07-24 12:20:20

This sounds like an interesting solution - do you have an example of such a method?

Robert 2009-07-24 12:42:40

I don't have the time to supply working code, but basically instead of building the path in StringBuilder instances, represent a path as a list of path elements, each with a name and a partial hash down to that element.

Michael Borgwardt 2009-07-24 12:55:19

Answer 3

A:

You first create a list of all paths and then once you have them all you compute hashes. Size of list of all those paths is O(n^3) (there's O(n^2) paths, each O(n) long) Why? Why not just compute the hashes as you traverse the tree? This way you'll take whole one n out of your time complexity.

The code for proper solution (result ends up in passed in list of integers):

public void getPaths(AbstractTree<String> tree, StringBuilder parentPath, 
    List<Integer> list)
  StringBuilder newPath = parentPath.clone();
  newPath.append("/");
  newPath.append(tree.getNodeName());
  list.add(newPath.toString().hashCode());
  if (!tree.isLeaf()){    
     Iterator<AbstractTree<String>> child = tree.getChildren().iterator();
     for (AbstractTree<String> child : tree.getChildren()){
       getPaths(child, newPath, list)
     }
  }  
}

This still is O(n^2). That is because of hashing of O(n^2) worth of strings (each node has a path length proportional to its depth) and you could bring it down even to O(N) if you had a hash that for given node takes only a hash of the path of its parents and modifies it in some way.

Furhter optimizations include: - parallel tree traversal - using smarter hashing (i.e. hash of a child is a function of child and hash of the parent path, not whole parent path).

Marcin 2009-07-24 12:20:30

tried to compute hashes during tree traveral, but it gives the wrong answer - perhaps you can see why? (see original question for code)

Robert 2009-07-24 12:41:10

I improved the solution. It should be better now.

Marcin 2009-07-24 13:40:31

I'm a little confused by this solution. First, how do you get the result? passing a list as parameter makes a copy of the list and doesn't modify the original list. Secondly, the clone method is not visible to parentPath.

Robert 2009-07-24 14:18:10

Robert, passed in list is NOT copied. It is passed in by reference, so you can freely modify it. The SAME problem was in your solution.

Marcin 2009-07-25 07:24:39

Answer 4

A:

I think complexity is still the same. No matter if you use a inline creation of hashes (O(n^2)) or doing it after recursion (O(n^2 + n) = O(n^2)). The only opportunity to find a fast way is to do some of the work at another place. e.g. you can hash the path while inserting a node and only gather all hashes at another point.

Dirk 2009-07-24 14:27:57

ansaurus

tags:

views:

answers:

Slow building list of paths

related questions