views:

57

answers:

3

I have a large (100,000+ nodes) Directed Acyclic Graph (DAG) and would like to run a "visitor" type function on each node in order, where order is defined by the arrows in the graph. i.e. all parents of a node are guaranteed to be visited before the node itself.

If two nodes do not refer to each other directly or indirectly, then I don't care which order they are visited in.

What's the most efficient algorithm to do this?

+2  A: 

You would have to perform a topological sort on the nodes, and visit the nodes in the resulting order.

The complexity of such algorithm is O(|V|+|E|) which is quite good. You want to traverse all nodes, so if you would want a faster algorithm than that, you would have to solve it without even looking at all edges, which would be dangerous, because one single edge could havoc the order completely.

aioobe
I was afraid that might be the case - I was hoping there might be some cheap O(n) solution....
mikera
Regarding the question clarification, if a indirectly refers to c then yes clearly a needs to be visited before c. But I think that is consistent with the question? I only don't care about the order if there is no direct or indirect link, this is an indirect link so I therefore care.
mikera
Sorry, I misunderstood your question. See my updated answer.
aioobe
Thanks! I managed to get this to work with a topological sort, my approach was to traverse the tree in depth-first order, write the visited nodes out to an array and then visit that array in reverse....
mikera
+1  A: 

There are some answers here: http://stackoverflow.com/questions/1320688/good-graph-traversal-algorithm

and here: http://en.wikipedia.org/wiki/Topological_sorting

In general, after visiting a node, you should visit its related nodes, but only the nodes that are not already visited. In order to keep track of the visited nodes, you need to keep the IDs of the nodes in a set (or map), or you can mark the node as visited (somehow).

If you care about the topological order, you must first get hold of a collection of all the un-traversed links ("remaining links") to a node, sorted by the id of the referenced node (typically: map(node-ID -> link-count)). If you haven't got that, you might need to build it using an approach similar to the one above. Then, start by visiting a node whose remaining incoming link count is zero. For each link from that node, reduce the remaining link count for each related node, adding the related node to the set of nodes-to-visit (or just visiting the node) if the count reaches zero.

eirikma
+1  A: 

As mentioned in the other answers, this problem can be solved by Topological Sorting.

A very simple algorithm for that (not the most efficient):

Keep an array (or map) indegree[] where indegree[node]=number of incoming edges of node

while there is at least one node n with indegree[n]=0:
   for each node n in nodes where indegree[n]>0:
      visit(n)
      indegree[n]=-1 # mark n as visited
      for each node x adjacent to n:
          indegree[x]=indegree[x]-1 # its parent has been visited, so one less edge coming into it
MAK