views:

319

answers:

4

I've written an algorithm which calculates and stores all paths of a DAG, and it works nicely on small graphs - but now i'm looking to improve it's efficiency to run over larger graphs. The core logic of the algorithm is in createSF() and makePathList() below, the other methods are helpers - I can see that append is a bottleneck. However, I guess the biggest help would be to devise a data structure that can store paths in a dictionary, since many of the paths are composed of other paths, this is the crux of my question.

private Multiset<String> paths = new Multiset<String>();    

public Multiset<String> createSF(DAGNode n) {

    for (DAGNode succ : n.getSuccessors())
        createSF(succ);
    if (!n.isVisited())
        for (String s : makePathList(n)) 
            paths.put(s);

    n.setVisited(true);
    return paths;
}

private List<String> makePathList(DAGNode n) {
    List<String> list = new ArrayList<String>();

    list.add(n.getLabel());
    for (DAGNode node : n.getSuccessors())
        list.addAll(append(n.getLabel(), makePathList(node)));

return list;
}

private List<String> append(String s, List<String> src) {
    List<String> ls = new ArrayList<String>();
    for (String str : src) 
    ls.add(s + "/" + str);

    return ls;
}

EDIT:

I'm now using a path object to represent paths and have pin-pointed the bottle neck to these two methods:

public List<Path> createPathList(Tree n) {
    List<Path> list = new ArrayList<Path>();
    list.add(new Path(n.getNodeName()));
    for (Tree node : n.getSuccessors()) {
        list.addAll(append(n.getNodeName(), createPathList(node)));
    }
    return list;
}

public List<Path> append(String s, List<Path> src) {
    List<Path> ls = new ArrayList<Path>();
    for (Path path : src) {
        ls.add(new Path(path, s));
    }
    return ls;
}

Trouble is when a graph is size M these methods will be called M times, this means there is a lot of lists being created here... is there a more efficient way to build up the return for createPathList()?

A: 

Take a look at the DOT source code from Graphviz, it might give you some ideas.

Robert S. Barnes
+5  A: 

In order to answer this question, it is necessary to understand why do you need the list of paths. The list of paths does not give you any additional information over what you have in the DAG representation.

If you want to calculate things for each path separately, or calculate something like sum/min/max over all paths, it too could be done using the DAG itself.

If you do insist on saving separate paths, one option would be to convert your DAG into a variant of a Trie. Another option could be to use some variant of the Lempel-Ziv representation. It depends on your DAG types, and what you expect to do with the paths information.

Anna
I specifically need it in the form of a multiset of paths because I am using it in that form for another algorithm which determines the entropic complexity.
Robert
In this case, storing the paths in a different data structure won't help you, as you need a full length string representation.
Anna
Edit: if you can change the parameters of the second algorithm, Lempel-Ziv (dictionary) style representation might save some space, and work faster.
Anna
A: 

Please allow me to put two (hopefully helpful) comments first:

I had some difficulties understanding your code, because some of the method names misled me. From looking at the names, I expected something else. May I suggest some refactoring:

makePathList -> createPathList  // you actually create a List here
append -> createPathList // yes, same name as above because it creates the same
                         // type of list, just with different parameters

(removed part of the answer that became obsolete after Robert's edit)

As Margus said, replacing the String concatenation with a StringBuilder append chain doesn't increase your performance. Compilers may optimize String concatenations to StringBuilder appends anyway (I've seen such byte code).

You could try to convert the DAG into a tree structure. Introduce an invisible root with all nodes as direct children. Now for each node add it's successors (child and/or sibling). The number of leaves now should be equal to the number of path and every graph from the root to any leaf is one path in the DAG.

Edit

A small improvement - it's micro-optimization but at least it will leave less garbage:

private List<String> append(String node, List<String> allPathsStartingAfterNode) {
    List<String> allPathsStartingAtNode = new ArrayList<String>();
    String nodeWithSeparator = node + "/";

    for (String aPathStartingAfterNode : allPathsStartingAfterNode) {
        allPathsStartingAtNode.add(nodeWithSeparator + aPathStartingAfterNode);
    }

    return allPathsStartingAtNode;
}
Andreas_D
sorry, some superfluous code left in from when I was using a tree as input
Robert
A: 

A simple modification (depending on how you use the data) might be to load the paths lazily, that way if you tend to only use a few paths you'll never even generate some paths.

Of course, this depends entirely on expected use

Martin