views:

420

answers:

4

I have about 70k nodes, and 250k edges, and the graph isn't necessarily connected. Obviously using an efficient algorithm is crucial. What do you recommend?

As a side note, I would appreciate advice on how to divide the task up between several machines--is that even possible with this kind of problem?

Thanks

+4  A: 

MapReduce is a great distributed algorithm for this, though it might be a little too high-powered. If you're interested in that, take a look at this lecture or maybe this blog post for inspiration. (In fact, when I was taught MapReduce, this was one of the first examples.)

For 250k edges and 70k, it seems like the graph is relatively sparse, Dijkstra's algorithm runs in O( E + V log V ) for each node, for a full running time (all sources) of O( VE + V^2 log V ). This should be fast enough, but the usual caveats apply for Dijkstra's. (Negative edges.)

You can also take a look at Johnson's algorithm if your problem deals with negative weights, but not negative cycles. Specifically, it can also be distributed, as it takes the reweighted graph and runs Dijkstra's algorithm from each node.

Larry
The latter links doesn't show up: http://en.wikipedia.org/wiki/Dijkstra's_algorithm and http://en.wikipedia.org/wiki/Johnson's_algorithm
Larry
@Larry: You should read about the markdown syntax for how to format your post. +1 anyway from me.
Mark Byers
@Larry: Read here: http://stackoverflow.com/editing-help
Mark Byers
I mean, it seems like there's a link limit.
Larry
+3  A: 

You could use the Floyd-Warshall algorithm. It solves exactly this problem.

The complexity is O(V^3).

There is also Johnson's algorithm with a complexity of O(V^2*log V + VE). The latter is also easy to distribute because it runs Dijkstra's algorithm V times, which can be done in parallel.

Mark Byers
Hm. But has `O(n^3)` complexity. It may not be very efficient.
dirkgently
@dirkgently: In fact, the complexity would be something like O(V^2+V*E). That is not rocket fast, but you probably can't get much more if you want V^2 outputs.
jpalecek
@jpalecek: I was referring to the original post and Floyd Warshall in particular.
dirkgently
Wow, Floyd-Warshall is a concise algorithm. If I didn't have so many nodes, I would definitely pick it.
awegawef
+1  A: 

There are two naive ways of parallelizing this problem:
1) Identify subcomponents and distribute these over different computers. The path length between two nodes from two different components is undefined.

2) Load the graph in different computers and give every computers a list of nodes to calculate all shortest paths. The results for one node are not dependent on the results of another node so you can parallelize this problem.

Upside: not too hard to implement but I would only do it like this if you have to solve this once. If this is a recurring issue then you might want to look at distributed algorithsm.

Use igraph, it's written in C, pretty fast and you can use Python as a wrapper language.

DrDee
A: 

Look at papers/publications which have the following keywords: distributed graph search algorithms. Here's one that may be of help.

There's this ACM account only paper as well: Distributed computation on graphs: shortest path algorithms

dirkgently