views:

269

answers:

5

I am trying to find a fast algorithm with modest space requirements to solve the following problem.

For each vertex of a DAG find the sum of its in-degree and out-degree in the DAG's transitive closure.

Given this DAG:

DAG from Wikipedia

I expect the following result:

Vertex #   Reacability Count  Reachable Vertices in closure
   7             5            (11, 8, 2, 9, 10)
   5             4            (11, 2, 9, 10)
   3             3            (8, 9, 10)
  11             5            (7, 5, 2, 9, 10)
   8             3            (7, 3, 9)
   2             3            (7, 5, 11)
   9             5            (7, 5, 11, 8, 3)
  10             4            (7, 5, 11, 3)

It seems to me that this should be possible without actually constructing the transitive closure. I haven't been able to find anything on the net that exactly describes this problem. I've got some ideas about how to do this, but I wanted to see what the SO crowd could come up with.

+1  A: 

OMG IT'S WRONG! SORRY!

I'll leave this up until a good alternative is available. CW-ed so feel free to discuss and expand on this if possible.


Use dynamic programming.

for each vertex V
   count[V] = UNKNOWN
for each vertex V
   getCount(V)


function getCount(V)
   if count[V] == UNKNOWN
      count[V] = 0
      for each edge (V, V2)
        count[V] += getCount(V2) + 1          
   return count[V]

This is O(|V|+|E|) with adjacency list. It counts only the out-degree in the transitive closure. To count the in-degrees, call getCount with edges reversed. To get the sum, add up the counts from both calls.


To see why this is O(|V|+|E|), consider this: each vertex V will be visited exactly 1 + in-degree(V) times: once directly on V, and once for every edge (*, V). On subsequent visits, getCount(V), simply returns the memoized count[V] in O(1).

Another way to look at it is to count how many times each edge will be followed along: exactly once.

polygenelubricants
Does this count the in-reachability as well? For instance the DAG `A -> B -> C` should give the result `2` for all 3 nodes.
KennyTM
Addressed that. Please double check (I'm coming up with this as I go).
polygenelubricants
I'm not sure you're complexity estimate is accurate. Wouldn't the recursive call to getCount() imply that the vertices may be visited more than once if they have in-degree > 1.
ChrisH
Addressed that. Please double check (I'm coming up with this as I go).
polygenelubricants
Ahh yes, I overlooked the memoization.
ChrisH
This algorithm is incorrect. On a diamond graph A -> B, A -> C, B -> D, C -> D it returns getCount(A)=4, whereas the correct answer is 3.
jkff
@jkff OMG you are right! Let me think about this...
polygenelubricants
A: 

For each node, use BFS or DFS to find the out-reachability.

Do it again for the reversed direction to find the in-reachability.

Time complexity: O(MN + N2), space complexity: O(M + N).

KennyTM
A: 

I assume that you have a list of all vertices, and that each vertex has an id and a list of vertices you can directly reach from it.

You can then add another field (or however you represent that) that holds the vertices you can also indirectly reach. I would do this in a recursive depth-first search, memoizing the results in the field of the respective reached nodes. As a data structure for this, you would perhaps use some sort of tree which allows efficient removal of duplicates.

The in-reachability can be done separately by adding the inverse links, but it can also be done in the same pass as the out-reachability, by accumulating the currently out-reaching nodes and adding them to the corresponding fields of the reached nodes.

Svante
That was my first implementation. This technique works, but it is constructs the full transitive closure of the underlying DAG.
ChrisH
Well, you need some way to eliminate duplicates.
Svante
True, but it may not be required to store the full closure all at once. I think it should be possible to get an answer for some vertices before the whole closure is built and discard the reachability for completed vertices before the entire traversal is compelete.
ChrisH
+1  A: 

For an exact answer, I think it's going to be hard to beat KennyTM's algorithm. If you're willing to settle for an approximation, then the tank counting method ( http://www.guardian.co.uk/world/2006/jul/20/secondworldwar.tvandradio ) may help.

Assign each vertex a random number in the range [0, 1). Use a linear-time dynamic program like polygenelubricants's to compute for each vertex v the minimum number minreach(v) reachable from v. Then estimate the number of vertices reachable from v as 1/minreach(v) - 1. For better accuracy, repeat several times and take a median of means at each vertex.

Although I am looking for an exact answer I like your post. Thanks for the article, that was very interesting, and the tank counting method is good to be aware of. +1 for thinking outside the box.
ChrisH
A: 

I have constructed a viable solution to this question. I base my solution on a modification of the topological sorting algorithm. The algorithm below calculates only the in-degree in the transitive closure. The out-degree can be computed in the same fashion with edges reversed and the two counts for each vertex summed to determine the final "reachability count".

for each vertex V
   inCount[V] = inDegree(V)   // inDegree() is O(1)
   if inCount[V] == 0
      pending.addTail(V)

while pending not empty
   process(pending.removeHead())

function process(V)
   for each edge (V, V2)
      predecessors[V2].add(predecessors[V])   // probably O(|predecessors[V]|)
      predecessors[V2].add(V)
      inCount[V2] -= 1
      if inCount[V2] == 0
          pending.add(V2)
   count[V] = sizeof(predecessors[V])         // store final answer for V
   predecessors[V] = EMPTY                    // save some memory

Assuming that the set operations are O(1), this algorithm runs in O(|V| + |E|). It is more likely, however, that the set union operation predecessors[V2].add(predecessors[V]) makes it somewhat worse. The additional steps required by the set unions depends on the shape of the DAG. I believe the worst case is O(|V|^2 + |E|). In my tests this algorithm has shown better performance than any other I have tried so far.

Furthermore, by disposing of predecessor sets for fully processed vertices, this algorithm will typically use less memory than most alternatives. It is true, however, that the worst case memory consumption of the above algorithm matches that of constructing the transitive closure, but that will not be true for most DAGs.

ChrisH