ansaurus

Question

Is there an effient way of determining whether a leaf node is reachable from another arbitrary node in a Directed Acyclic Graph?

Answer 1

+2 A:

Just color (keep track of) visited nodes.

Sample in Python:

def reachable(nodes, edges, start, end):
  color = {}
  for n in nodes:
    color[n] = False
  q = [start]
  while q:
    n = q.pop()
    if color[n]:
      continue
    color[n] = True
    for adj in edges[n]:
      q.append(adj)
  return color[end]

lispmachine 2009-05-28 15:50:01

+1 because that'd work given the info given, but these nodes are our business objects and we'd need to uncolor all the nodes after we're done with them. I already mentioned keeping track of nodes already visited though

Davy8 2009-05-28 16:00:40

The structure which contains color data does not need to be part of those business objects. It would be cumbersome (and not thread safe) to keep additional data for every algorithm. A simple mapping of those objects to boolean (like dict in example above) will suffice.

lispmachine 2009-05-28 16:10:53

Good point. The other thing is that as I mentioned in my edit (not sure if you saw it) I don't have a list of all nodes and edges, I have the single node and each node has a list of it's parents that I need to traverse recursively. I'm guessing it's just a matter of passing around the "color" list.

Davy8 2009-05-28 16:20:44

Answer 2

+2 A:

There are a few methods that each may be faster depending on your structure, but in general what youre going to want is a traversal.

A depth first search, goes through each possible route, keeping track of nodes that have already been visited. It's a recursive function, because at each node you have to branch and try each child node of it. There's no faster method if you dont know which way to look for the object you just have to try each way! You definitely need to keep track of where you have already been because it would be wasteful otherwise. It should require on the order of the number of nodes to do a full traversal.

A breadth first search is similar but visits each child of the node before "moving on" and as such builds up layers of distance from the chosen root. This can be faster if the destination is expected to be close to the root node. It would be slower if it is expected to be all the way down a path, because it forces you to traverse every possible edge.

Youre right about maybe keeping a list of known root nodes, the tradeoff there is that you basically have to do the search whenever you alter the graph. If you are altering the graph rarely this is acceptable, but if you alter the graph more frequently than you need to generate this information, then of course it is too costly.

EDIT: Info Update. It sounds like we are actually looking for a path between two arbitrary nodes, the root/leaf semantic keeps getting switched. The DepthFirstSearch (DFS) starts at one node, and then for each unvisited child, recurse. Break if you find the target node. Due to the way recursion evaluates, this will traverse all the way down the 'left' path, then enumerate nodes at this distance before ever getting to the 'right' path. This is time costly and inefficient if the target node is potentially the first child on the right. BreadthFirst walks in steps, covering all children before moving forward. Because your graph is bottom heavy like a tree, both will be approximately the same execution time.

When the graph is bottom heavy you might be interested in a reverse traversal. Start at the target node and walk upwards, because there are relatively fewer nodes in this direction. So long as the nodes in general have more parents than children, this direction will be much faster. You can also combine the approaches, stepping one up and one down , then comparing lists of nodes, and meeting somewhere in the middle. (this combination might seem the fastest if you ignore that twice as much work is done at each step).

However, since you said that your graph is stored as a list of lists of children, you have no real way of traversing the graph backwards. A node does not know what its parents are. This is a problem. To fix it you have to get a node to know what its parents are by adding that data on graph update, or by creating a duplicate of the whole structure (which you have said is too large). It will need the whole structure to be rewritten, which sounds probably out of the question due to it being a large database at this point. There's a lot of work to do. http://en.wikipedia.org/wiki/Graph_(data_structure)

Karl 2009-05-28 15:59:50

Well, we need to read this information about 3 times as often (very rough guesstimate) as the graph changes, but since as I just edited, the graph's more bottom heavy and I'm looking for nodes at the top.

Davy8 2009-05-28 16:12:16

This is getting more and more complicated. Are you starting from a leaf and looking for a root, or are you starting from a root and looking for a leaf? The procedures will be different, and youve asked both... are we looking for a path between two arbitrary nodes? That might be a clearer solution.

Karl 2009-05-28 17:01:37

Sorry for the confusion, I'm starting from an arbitrary node and looking for all roots (don't care about the path)

Davy8 2009-05-28 17:57:14

They key point being I'm not looking for one specific node, I'm looking for ALL root nodes (i.e. nodes without parents, each node may have more than one parent) that are reachable from the specified node. I have no information on any nodes other than object references to a node's children and a node's parents.

Davy8 2009-05-28 18:00:06

(No information on any of the rest of the nodes in the graph)

Davy8 2009-05-28 18:00:50

Good that you have a list of a node's parents, so you can walk upwards. Since you need multiple roots you will have to do a lot of traversing, and you can always go in one direction, and if two paths merge (visited node is seen again) you can ignore it and let the other track take care of it. Just needs a couple global lists.

Karl 2009-05-28 20:12:10

Answer 3

A:

For a vertex x you want to compute a bit array f(x), each bit corresponds to a root vertex Ri, and 1 (resp 0) means "x can (resp can't) be reached from root vertex Ri.

You could partition the graph into one "upper" set U containing all your target roots R and such that if x in U then all parents of x are in U. For example the set of all vertices at distance <=D from the closest Ri.

Keep U not too big, and precompute f for each vertex x of U.

Then, for a query vertex y: if y is in U, you already have the result. Otherwise recursively perform the query for all parents of y, caching the value f(x) for each visited vertex x (in a map for example), so you won't compute a value twice. The value of f(y) is the bitwise OR of the value of its parents.

Eric Bainville 2009-06-02 13:26:31

ansaurus

tags:

views:

answers:

Is there an effient way of determining whether a leaf node is reachable from another arbitrary node in a Directed Acyclic Graph?

related questions