views:

1971

answers:

9

I have a large, connected, sparse graph in adjacency-list form. I would like to find two vertices that are as far apart as possible, that is, the diameter of the graph and two vertices achieving it.

I am interested in this problem in both the undirected and directed cases, for different applications. In the directed case, I of course care about directed distance (the shortest directed path from one vertex to another).

Is there a better approach than computing all-pairs shortest paths?

Edit: By "as far apart as possible", I of course mean the "longest shortest path" -- that is, the maximum over all pairs of vertices of the shortest distance from one to the other.

+3  A: 

Edit I'm undeleting again, simply so I can continue commenting. I have some comments on Johnson's Algorithm below this answer. - Aaron

My original comment : I too am curious about this problem, but don't have an answer. It seems related to the Minimum Spanning Tree, the subgraph connecting all vertices but having fewest (or lowest weight) edges. That is an old problem with a number of algorithms; some of which seem quite easy to implement.

I had initially hoped that the diameter would be obvious once the MST had been found, but I'm losing hope now :-( Perhaps the MST can be used to place a reasonable upper bound on the diameter, which you can use to speed up your search for the actual diameter?

Aaron McDaid
Finding the MST looks like a very good first step, but I we can NOT assume that the diameter path passes through the MST. I can think of a simple example that shows that. Unfortunately, I can't draw it here.
jpbochi
That is true. But the diameter of the MST cannot be shorter than the diameter of the graph as a whole. Therefore it places an upper bound, but not a lower bound, on the diameter of the graph. However, I must admit that such an upper bound may not be very useful.
Aaron McDaid
By the way, I'm new to stack overflow and I probably should have put my original comment in as a 'comment', not as an answer. I wasn't intending to claim to have an answer, I just wanted to join the discussion. I see two users ( dlamblin and jrockway ) have managed to post comments, not answers, directly to the question; but I can't see such an option.Apologies ...
Aaron McDaid
Thanks for that clarification A.Rex. I'll delete my answer now then, I hope that'll increase exposure of the question again. I'm guessing it'll also delete some of these comments though :-(
Aaron McDaid
@A. Rex: Do you have weights in your graph, and are any of them negative? Johnson's algorithm (according to Wikipedia) just transforms the data to remove the negative weights, then perform Dijkstra's algorithm on each node in turn.So assuming you have non-negative (and perhaps all equal) weights, it appears you have to do something like a brute force involving Dijkstra's algorithm anyway.
Aaron McDaid
@Aaron McDaid: No, none of my weights are negative, so yes Johnson's algorithm is just Dijkstra V times. My question is whether I can avoid all-pairs shortest paths.
A. Rex
+7  A: 

Well, I've put a little bit of thought on the problem, and a bit of googling, and I'm sorry, but I can't find any algorithm that doesn't seem to be "just find all pairs shortest path".

However, if you assume that Floyd-Warshall is the only algorithm for computing such a thing (Big-Theta of |V|^3), then I have a bit of good news for you: Johnson's Algorithm for Sparse Graphs (thank you, trusty CLRS!) computes all pairs shortest paths in (Big-Oh (|V|^2 * lgV + VE)), which should be asymptotically faster for sparse graphs.

Wikipedia says it works for directed (not sure about undirected, but at least I can't think of a reason why not), here's the link.

Is there anything else about the graph that may be useful? If it can be mapped easily onto a 2D plane (so, its planar and the edge weights obey the triangle inequality [it may need to satisfy a stricter requirement, I'm not sure]) you may be able to break out some geometric algorithms (convex-hull can run in nlogn, and finding the farthest pair of points is easy from there).

Hope this helps! - Agor

Edit: I hope the link works now. If not, just google it. :)

Agor
Thanks for the comments. I was aware of Johnson's algorithm, but I suppose it's a good idea to have it here for posterity anyway. My graphs cannot be naturally embedded in low-dimensional spaces in any way.
A. Rex
+1 for CLR(S) ! and an undirected graph is just a directed graph where all of the edges are doubled, one in each direction!
Brian Postow
+1  A: 

I don't know of a better method for computing diameter other than all shortest paths, but Mathematica uses the following approximation for PseudoDiameter:

  • A graph geodesic is the shortest path between two vertices of a graph. The graph diameter is the longest possible length of all graph geodesics of the graph. PseudoDiameter finds an approximate graph diameter. It works by starting from a vertex u, and finds a vertex v that is farthest away from u. This process is repeated by treating v as the new starting vertex, and ends when the graph distance no longer increases. A vertex from the last level set that has the smallest degree is chosen as the final starting vertex u, and a traversal is done to see if the graph distance can be increased. This graph distance is taken to be the pseudo-diameter.

http://reference.wolfram.com/mathematica/GraphUtilities/ref/PseudoDiameter.html

job
Thanks! That's definitely a plausible heuristic.
A. Rex
In the undirected non-negative weight case, would this algorithm find the actual diameter of the graph? In the directed case I can think of examples that would cause the real diameter to not be found, but I can't imagine them for the undirected case. I'm tempted to start writing code.
Bribles
@Bribles For the directed case, I'd imagine you'd have to do two searches at each node. One forward (following links source -> dest) and one backward (dest -> source) so that you didn't get stuck in a node with no in/out links. Then you'd just take the longer path. Is that the issue you have with directed graphs? I have no proofs as to how well this performs, but I imagine it would work quite well.
job
@job My real question is for undirected graphs if the pseudodiameter would in fact be the real diameter not just an approximation? And if that's not the case, what's an example of a undirected graph where the above listed PseudoDiameter finding algorithm does not find the true diameter?
Bribles
A: 

A dirty method:

We know that for a graph G(V,E) with |V|=n and |E|=m, Dijkstra algorithm runs in O(m+nlogn) and this is for a single source. For your all-pairs problem, you need to run Dijkstra for each node as a starting point.

However, if you have many machines, you can easily parallel this process.

This method is easiest to implement, definitely not very good.

Yin Zhu
The key question is if I can do better than computing all-pairs shortest paths, whether sequentially or in parallel.
A. Rex
+2  A: 

Here's some thoughts on doing better than all pairs shortest paths in an undirected graph, although I'm not sure just how much of an improvement it would be.

Here's a subroutine that will find two nodes distance D apart, if there are any. Pick an arbitrary node x and compute M[x] = maximum distance from x to any other node (using any single source shortest path algorithm). If M[x] >= D, then x is one of our nodes and the other is easy to find. However, if M[x] < D, then neither endpoint we're looking for can be less than distance D - M[x] from x (because there are paths from that node to all other nodes, through x, of distance < D). So find all nodes of distance less than D-M[x] from x and mark them as bad. Pick a new x, this time making sure we avoid all nodes marked as bad, and repeat. Hopefully, we'll mark lots of nodes as bad so we'll have to do many fewer than |V| shortest path computations.

Now we just need to set D=diam(G) and run the above procedure. We don't know what diam(G) is, but we can get a pretty tight range for it, as for any x, M[x] <= diam(G) <= 2M[x]. Pick a few x to start, compute M[x] for each, and compute upper and lower bounds on diam(G) as a result. We can then do binary search in the resulting range, using the above procedure to find a path of the guessed length, if any.

Of course, this is undirected only. I think you could do a similar scheme with directed graphs. The bad nodes are those which can reach x in less than D-M[x], and the upper bound on diam(G) doesn't work so you'd need a larger binary search range.

Keith Randall
Thanks. This answer is at least promising in that it suggests an alternate algorithm. I wonder what the performance is ...
A. Rex
A: 

I really doubt that there is any method of finding the longest-shortest path without having to use some sort of all pairs shortest path algorithm (finding single source shortest path repeatedly is basically doing all pairs in the worst case).

'Diameter' becomes hard to define in terms of the 'longest path' if the graph is not a tree or a DAG. The 'longest' path can be infinite if there are cycles in the graph. Hence a simple traversal of the graph cannot yield the longest path over all nodes. Since you have already stated that your graph is not necessarily acyclic, and you are interested in the "longest shortest" path, there does not seem to be any method that can avoid finding the shortest path for all nodes. Using Johnson's Algorithm, as Agor suggested, is probably best for doing that.

You can of course go with a heuristics based approach. The algorithm that uses pseudo-peripheral vertex appears to be the most commonly used approach.

MAK
Re "The definition of 'diameter' becomes meaningless if the graph is not a tree or a DAG": That's not true. Read the Wikipedia link for the standard definition of "diameter", which doesn't care if the graph is acyclic.
A. Rex
Yup: You can't run through cycles as long as You like, just to increase the length(edge wise) of the path, because then it surely is no shortest(weigh wise) path any more.
Dave
@A. Rex: You are right. I have edited my post to correct the wording.
MAK
A: 

Forgive me if my answer is not correct in terms of syntax but my Algorithm course was a while ago (and not in English).

If I understand your problem correctly, you want to know what is the highest number you can count up to by starting from node A and reaching node B without "retracing" your steps. If this is the case, I would imagine your graph as acyclic (the cyclic option comes later).

First of all, the upper limit is the number of edges. How I see the thing is: take one node, create a tree where the node is at the root and each subsequent node you can reach is at the next level. The height of the tree you build is the diameter, and the leaves are the nodes that are at that distance. If that distance = number of edges you're done. If not, pick another node and repeat.

I think it's similar to the building of a breadth-first search. Not knowing much else about the graph you could employ some heuristics to see which tree orientation (i.e. which node should be picked first) would be better, but that's another topic.

Regarding the cyclicity of the graph -- as others have pointed out those can lead to infinite loops. A way to get rid of those could be to 'rule out' the nodes that belong to a cycle and adding the longest path between them as the value you'll get by entering the cycle and coming out of it, touch each node only once.

Now, as I said this method could very easily be the same as doing all-paires shortest path. Worst case complexity is certainly the same, and could not be otherwise.

lorenzog
A: 

One way to obtain an estimate of this number is to start at a random point, and do a breadth-first "grassfire" algorithm, marking the shortest distance to each node. The longest distance here is your estimate.

Running this extremely fast algorithm multiple times with different starting points and then taking the maximum will increase the accuracy of the estimate, and, of course, give you a decent lower bound. Depending on the distribution and connectivity of your graph, this estimate may even be accurate!

If your graph is large enough, asymptotic analysis of how the estimate changes as you add more samples might allow you to project to an even better guess.

If you're interested in an exact answer, it seems unlikely that you can get away with cutting corners too much unless your graph is easy to partition into components that are weakly connected with each other - in which case you can restrict your search to shortest path between all pairs of vertices in different components.

Justin W
A: 

Not sure if it fits the bill, but interesting:

HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop

U. Kang, C. Tsourakakis, A. P. Appel, C. Faloutsos, J. Leskovec, “HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop”, CMU ML Tech Report CMU-ML-08-117, 2008.

GalacticJello