views:

99

answers:

1

I wrote a source-removal algorithm to sort some dependencies between tables in our database, and it turns out we have a cycle. For simplicity, let's say we have tables A, B, C, and D. The edges are like this:

(A, B)
(B, A)
(B, C)
(C, D)
(D, A)

As you can see, there are two cycles here. One is between A and B and another is between all four of them. Will this type of sort always choke on the largest cycle? Or is that not necessarily the case?

+1  A: 

By source-removal I presume you mean at each step removing a node with no incoming edges.

What I think you are asking for is finding the maximal Euler tour of your graph (i.e. a cycle with unique edges, while nodes can be repeated).

Obviously, no vertex in a cycle can be removed (no vertex in the cycle would have zero incoming edges), so this algorithm certainly preserves all cycles (and the biggest), but still, it doesn't help you find it, the remaining edges are not guaranteed to be part of any cycle (I can easily construct an example where the algorithm you describe retains all edges, while the largest cycle is merely of size two, thus not too helpful in finding the latter).

Here is how you can do it instead:

You are interested in recognizing back edges, i.e., in the traversal, an edge which points back to an ancestor (in the DFS tree, which is induced by edges of visiting nodes for the first time) of the visited node. For example, if the DFS stack has nodes [A->B->C->D] and while you explore D you find an edge D->B, that's a back edge. Each back edge defines a cycle.

More importantly, the cycles induced by back-edges are a basic set of cycles of the graph. "A basic set of cycles": you can construct all cycles of the graph just by UNIONing and XORing cycles of the basic set. For example, consider the cycles [A1->A2->A3->A1] and [A2->B1->B2->B3->A2]. You can union them to the cycle: [A1->A2->B1->B2->B3->A2->A3->A1]. Since you want the maximal cycle, you don't need to consider XORs.

  • Construct the maximal cycle by UNIONing all basic cycles that intersect at a node. (If you do it carefully this should also have a linear time complexity).

On the other hand, if you required a maximal cycle with no repeating vertex, that's going to be much harder than linear :)

Dimitris Andreou
I am actually not trying to find one type of cycle or another. I'd actually prefer to have no cycles (source removal *does* require a DAG). However, I noticed this behavior from a program that I wrote and was just curious if that behavior is to always be expected.
Jason Baker