Why DFS and not BFS for finding cycle in graphs.

+3 A:

Depth first search is more memory efficient than breadth first search as you can backtrack sooner. It is also easier to implement if you use the call stack but this relies on the longest path not overflowing the stack.

Also if your graph is directed then you have to not just remember if you have visited a node or not, but also how you got there. Otherwise you might think you have found a cycle but in reality all you have is two separate paths A->B but that doesn't mean there is a path B->A. With a depth first search you can mark nodes as visited as you descend and unmark them as you backtrack. See comments for a performance improvement on this algorithm.

For the best algorithm for detecting cycles in a directed graph you could look at Tarjan's algorithm.

Mark Byers 2010-05-19 21:44:45

(Memory efficient because you get to backtrack sooner, and easier to implement because you can just let the stack take care of storing the open list instead of having to explicitly maintain it.)

Amber 2010-05-19 21:47:47

IMO, it's only easier if you can rely on tail recursion.

Hank Gay 2010-05-19 21:48:58

+1 for pointing out the double path scenario.

Eyal Schneider 2010-05-19 21:57:33

Comments added to answer, thanks!

Mark Byers 2010-05-19 22:12:48

"unmark them as you backtrack" - at your own peril! This can easily lead to O(n^2) behavior, specifically such a DFS would misunderstand cross edges as "tree" edges ("tree" edges would also be a misnomer since they wouldn't actually form a tree anymore)

Dimitris Andreou 2010-05-20 08:15:08

@Dimitris Andreo: You can use three visited states instead of two to improve performance. With directed graphs there's a difference between 'I've seen this node before' and 'This node is part of a loop'. With undirected graphs they are equivalent.

Mark Byers 2010-05-20 12:12:45

Exactly, you definitely need a third state (to make the algorithm linear), so you should consider revising that part.

Dimitris Andreou 2010-05-20 12:49:08

A:

DFS is easier to implement
Once DFS finds a cycle, the stack will contain the nodes forming the cycle. The same is not true for BFS, so you need to do extra work if you want to also print the found cycle. This makes DFS a lot more convenient.

IVlad 2010-05-19 21:51:24

A:

If you place a cycle at a random spot in a tree, DFS will tend to hit the cycle when it's covered about half the tree, and half the time it will have already traversed where the cycle goes, and half the time it will not (and will find it on average in half the rest of the tree), so it will evaluate on average about 0.5*0.5 + 0.5*0.75 = 0.625 of the tree.

If you place a cycle at a random spot in a tree, BFS will tend to hit the cycle only when it's evaluated the layer of the tree at that depth. Thus, you usually end up having to evaluate the leaves of a balance binary tree, which generally results in evaluating more of the tree. In particular, 3/4 of the time at least one of the two links appear in the leaves of the tree, and on those cases you have to evaluate on average 3/4 of the tree (if there is one link) or 7/8 of the tree (if there are two), so you're already up to an expectation of searching 1/2*3/4 + 1/4*7/8 = (7+12)/32 = 21/32 = 0.656... of the tree without even adding the cost of searching a tree with a cycle added away from the leaf nodes.

In addition, DFS is easier to implement than BFS. So it's the one to use unless you know something about your cycles (e.g. cycles are likely to be near the root from which you search, at which point BFS gives you an advantage).

Rex Kerr 2010-05-19 21:53:05

A lot of magic numbers there. I disagree with the "DFS is faster" arguments. It depends entirely on the input, and no input is more common than another in this case.

IVlad 2010-05-19 21:56:51

@Vlad - The numbers aren't magic. They are means, are are stated as such, and are almost trivial to calculate given the assumptions I stated. If approximating by the mean is a bad approximation, that would be a valid criticism. (And I explicitly stated that if you could make assumptions about the structure, the answer could change.)

Rex Kerr 2010-05-20 03:10:56

@Rex Kerr - the numbers are magical because they don't mean anything. You took a case DFS does better on and extrapolated those results to the general case. Your statements are unfounded: "DFS will tend to hit the cycle when it's covered about half the tree": prove it. Not to mention that you cannot talk about cycles in a tree. A tree does not have a cycle by definition. I just don't see what your point is. DFS will go one way until it hits a dead end, so you have no way of knowing how much of the GRAPH (NOT tree) it will explore on average. You just picked a random case that proves nothing.

IVlad 2010-05-20 07:35:34

@Vlad - All noncyclic fully connected undirected graphs are (unrooted undirected) trees. I meant "a graph that would be a tree save for one spurious link". Perhaps this isn't the main application for the algorithm--maybe you want to find cycles in some tangled graph that has very many links that make it not a tree. But if it is tree-like, averaged over all graphs, any node is equally likely to be the source of said spurious link, which makes the expected tree coverage 50% when the link is hit. So I accept that the example may not have been representative. But the math should be trivial.

Rex Kerr 2010-05-20 13:40:51

+1 A:

A BFS could be reasonable if the graph is undirected (be my guest at showing an efficient algorithm using BFS that would report the cycles in a directed graph!), where each "cross edge" defines a cycle. If the cross edge is {v1, v2}, and the root (in the BFS tree) that contains those nodes is r, then the cycle is r ~ v1 - v2 ~ r (~ is a path, - a single edge), which can be reported almost as easily as in DFS.

The only reason to use a BFS would be if you know your (undirected) graph is going to have long paths and small path cover (in other words, deep and narrow). In that case, BFS would require proportionally less memory for its queue than DFS' stack (both still linear of course).

In all other cases, DFS is clearly the winner. It works on both directed and undirected graphs, and it is trivial to report the cycles - just concat any back edge to the path from the ancestor to the descendant, and you get the cycle. All in all, much better and practical than BFS for this problem.

Dimitris Andreou 2010-05-20 08:53:01

ansaurus

tags:

views:

answers:

Why DFS and not BFS for finding cycle in graphs.

related questions