views:

688

answers:

4

I'm using the Lengauer and Tarjan algorithm with path compression to calculate the dominator tree for a graph where there are millions of nodes. The algorithm is quite complex and I have to admit I haven't taken the time to fully understand it, I'm just using it. Now I have a need to calculate the dominator trees of the direct children of the root node and possibly recurse down the graph to a certain depth repeating this operation. I.e. when I calculate the dominator tree for a child of the root node I want to pretend that the root node has been removed from the graph.

My question is whether there is an efficient solution to this that makes use of immediate dominator information already calculated in the initial dominator tree for the root node? In other words I don't want to start from scratch for each of the children because the whole process is quite time consuming.

Naively it seems it must be possible since there will be plenty of nodes deep down in the graph that have idoms just a little way above them and are unaffected by changes at the top of the graph.

BTW just as aside: it's bizarre that the subject of dominator trees is "owned" by compiler people and there is no mention of it in books on classic graph theory. The application I'm using it for - my FindRoots java heap analyzer - is not related to compiler theory.

Clarification: I'm talking about directed graphs here. The "root" I refer to is actually the node with the greatest reachability. I've updated the text above replacing references to "tree" with "graph". I tend to think of them as trees because the shape is mainly tree-like. The graph is actually of the objects in a java heap and as you can imagine is reasonably hierarchical. I have found the dominator tree useful when doing OOM leak analysis because what you are interested in is "what keeps this object alive?" and the answer ultimately is its dominator. Dominator trees allow you to <ahem> see the wood rather than the trees. But sometimes lots of junk floats to the top of the tree so you have a root with thousands of children directly below it. For such cases I would like to experiment with calculating the dominator trees rooted at each of the direct children (in the original graph) of the root and then maybe go to the next level down and so on. (I'm trying not to worry about the possibility of back links for the time being :)

+1  A: 

Judging by the lack of comments, I guess there aren't many people on Stackoverflow with the relevent experience to help you. I'm one of those people, but I don't want such an interesting question go down with with a dull thud so I'll try and lend a hand.

My first thought is that if this graph is generated by other compilers would it be worth taking a look at an open-source compiler, like GCC, to see how it solves this problem?

My second thought is that, the main point of your question appears to be avoiding recomputing the result for the root of the tree.

What I would do is create a wrapper around each node that contains the node itself and any pre-computed data associated with that node. A new tree would then be reconstructed from the old tree recursively using these wrapper classes. As you're constructing this tree, you'd start at the root and work your way out to the leaf nodes. For each node, you'd store the result of the computation for all the ancestory thus far. That way, you should only ever have to look at the parent node and the current node data you're processing to compute the value for your new node.

I hope that helps!

Simon Johnson
Thanks Simon. Unfortunately the algorithm is fiendishly complex (at least to my eyes) and doesn't do anything straightforward like just recursively descending the tree. Eg it follows ancestor chains. Have a look at this just to get a feel for it: http://www.cl.cam.ac.uk/~mr10/lengtarj.pdf.
Dave Griffiths
+1  A: 

Could you elaborate on what sort of graph you're starting with? I don't see how there is any difference between a graph which is a tree, and the dominator tree of that graph. Every node's parent should be its idom, and it would of course be dominated by everything above it in the tree.

Jay Kominek
Hi, you're right, please see clarification above.
Dave Griffiths
+2  A: 

boost::lengauer_tarjan_dominator_tree_without_dfs might help.

J.F. Sebastian
Yes, that may help, thanks. I worry about the other part of the algorithm though which normally takes of the same order of time as the dfs but is sometimes worse (and is why you definitely need the path compression).
Dave Griffiths
A: 

I do not fully understand your question, but it seems to me you want to have some incremental update feature. I researched a while ago what algorithms are their but it seemed to me that there's no known way for large graphs to do this quickly (at least from a theoretical standpoint).

You may just search for "incremental updates dominator tree" to find some references.

I guess you are aware the Eclipse Memory Analyzer does use dominator trees, so this topic is not completely "owned" by the compiler community anymore :)

kohlerm