views:

700

answers:

6

I have a graph which contains an unknown number of disconnected subgraphs. What's a good algorithm (or Java library) to find them all?

+2  A: 

I'm assuming you want to find all the (strongly) connected components? For that you can use Tarjan's algorithm (a variant of DFS)

oggy
A: 

What about a breadth first search to find all connected nodes? Once you have the list of connected nodes, you subtract this list from the list of all nodes. You end up with a list of disconnected nodes

MedicineMan
+2  A: 

JGraphT is a nice open source graphing library licensed under the LGPL license. I have used it in the past to deal with graphs and detecting cycles within the graphs. It is also fairly easy to use, and you can use JGraph to visualize the graphs.

aperkins
ConnectivityInspector class should be what is needed.
serg
Having never had to do much other than prove existence/non-existence of cycles, I was not sure, so I simply linked to the library - it has been pretty solid for our project.
aperkins
+6  A: 

I think what you are looking for is generally called a Flood Fill. It is up to you whether you traverse the graph through a BFS or a DFS.

Basically you take an unlabeled (AKA uncoloured) node and assign a new label to it. You assign the same label to all nodes adjacent to that one, and so on to all nodes that are reachable from that node.

When no more reachable nodes can be labeled, you start over by picking another unlabeled node. Notice that the fact that this new node is unlabeled implies that it is not reachable from our earlier node and is thus in a different disconnected component.

When there are no more unlabeled nodes, the number of distinct labels you had to use is the number of components of the graph. The label for each node tells you which node is part of which component.

MAK
Thanks, this a great place to start and a very practical answer.
Ollie G
+1  A: 

There are many facets of this question that are not fully explained, so I'm going to give a somewhat exhaustive answer. My tendency to post walls-of-text notwithstanding. :/ Note also that I'm assuming this is a homework question or self-education question, so I'm not going to give a straight-up answer.

The two basic algorithms for detecting connectivity of a graph is Depth First Search and Breadth First Search. Those are really the 2 starting points you want to look at. Both will get you to the solution, but in different ways, and its hard to argue which is "better" without considering some pretty in-depth aspects of the problem. But let's move on.

As I mentioned before, you left out some important details, and I'll touch upon some possibilities here.

Is your graph directed or undirected? and do you consider connectivity in the "strong" sense (in which case, see oggy's answer), or connectivity in the "weak" sense? Depending on your answer, you will have to approach your algorithm in a subtly different way. Note that, for an undirected graph, weak and strong connectivity are equivalent, so that's nice. But you'll have to keep the structure of the graph in mind regardless, while implementing or finding an algorithm.

Furthermore, there is the question of what you mean by "finding the subgraphs" (paraphrase). Usually graph connectivity is a decision problem -- simply "there is one connected graph" or "there are two or more sub-graphs (aka, it's disconnected)". Having an algorithm for that requires the least amount of bookwork, which is nice. :) The next step up would be the count of graphs, literally the number of them, and that bookwork is also not so bad. Penultimately, you may want a list of nodes in each sub-graph. Lastly, you may want to literally copy out the sub-graphs, edges and all (so the return type would be a list of graphs, I suppose, with an implied invariant that each graph is connected). None of these different result-types would require a different algorithm, but will certainly require a different approach to the bookwork.

All of this may seem like a ridiculous amount of overkill for what is a pretty basic question, but I thought I'd just highlight all of the aspects involved in even such a simple graph question. As a sort of cliffhanger, note how none of this even yet touches upon runtime or memory usage! :) - Agor

Agor
Thanks, this is a self-education question and I appreciate the detail, I value your bringing the assumptions I made to light.I'm working with undirected graphs and my goal to produce a method which takes one graph and returns a collection of all the subgraphs.
Ollie G
A: 

I ran into a similar problem where I wanted all the weakly connected subgraphs of a directed graph. I blogged about it here. I used the JUNG API and compare two approaches. My first approach could be used as a template to solve your problem.

harschware