views:

505

answers:

2

Is there another way to calculate the shortest path for a near complete graph other than Dijkstra? I have about 8,000 nodes and about 18 million edges. I've gone through the thread "a to b on map" and decided to use Dijkstra. I wrote my script in Perl using the Boost::Graph library. But the result isn't what I expected. It took about 10+ minutes to calculate one shortest path using the call $graph->dijkstra_shortest_path($start_node,$end_node);

I understand there are a lot of edges and it may be the reason behind the slow running time. Am I dead in the water? Is there any other way to speed this up?

A: 

Do the edges have lengths, or are you just trying to find a path with the minimum number of edges?

10+ minutes is a long time for only 18M edges. Try doing some profiling to see what functions are taking so long.

jcd
They are positive weighted edges. Yes, I'm trying to find the minimum weight/cost. I agreed, it's taking too long. I've profiled everything up to the Boost Library.
Jason
It sounds like perl is just taking too long to feed the data to boost then. Write a short program in another language to do this step for you and call that from perl. Boost's implementation of dijkstra should be fine.
jcd
A: 

Short answer: Dijkstra's is your best bet if you want just a few shortest paths, and the Floyd-Warshall algorithm is better if you want to find the shortest paths between every pair of nodes.

  • Dijkstra's algorithm finds the shortest paths from one source to all other nodes in the graph, for weighted graphs. It operates on dense graphs in O(V^2) time.

  • Floyd-Warshall finds shortest paths between all pairs of nodes. It requires a dense representation and runs in O(V^3) time. It operates on weighted or unweighted graphs.

Even though your graph is dense (according to the title of your question), there might be some benefit to converting it to a sparse graph and using a sparse implementation of Dijkstra's if you just want to find a few shortest paths. Sparse Dijkstra's runs in O(E log V).

Please note that this is assuming that all your edge weights are non-negative; if they are, then you can't use any of these. You would have to use an even slower algorithm, like Bellman-Ford.

Greg
1. Dense, adjaceny matrix. They're pairwise to be exact, but does not quite form a complete graph. 2. Yes, they have positive weight.3. Ideally all. But at this point I'll be happy to take random samples or as many as I can get.Of the 3 you mentioned Dijkstra fits my problem that best. But the performance doesn't back up the claimed running time. I'm still lost for what could be the bottleneck.
Jason
If your weights can be compressed to 8 or 16 bits with integer representation, perhaps you could save some space and thus time?
Greg
Also, if the graph is undirected, then you should be able to reduce the memory by half by using only the upper triangle of the matrix. I focus on memory here because you might be having problems with data throughput.
Greg
I just implemented Dijkstra on a randomly created matrix-based graph with the same density as your graph. Mine created the graph and ran Dijkstra's algorithm 100 times from different source nodes in 40 seconds. So less than 0.5 seconds per call to Dijkstra. This was C++/Linux/3.0GHz xeon/8GB memory.
Greg
Greg, thanks for trying it out. You're probably correct about the data throughput. My process is showing about 2.4GB of virtual memory on a 2.6GHz Xeon with 16GB. Not too far off from your spec. I'm also starting to suspect Perl is not as efficient as I thought it would be. I choose it mostly for its ease of raw data processing.
Jason
Use dense Dijkstra, not sparse.
jcd
@Jason, my program uses only about 250 MB of virtual memory while running. That's an order of magnitude difference; you should find out why that is happening.I also ran Floyd-Warshall on the same graph, and it took 19 minutes to compute all the shortest paths, or about .14 seconds per shortest path.So I guess I would answer your original question as "write your own," because Boost might be doing something inefficient (but I've never worked with it). My Dijkstra implementation is only about 32 lines, and Floyd-Warshall is about 16. I can send them to you if you want.
Greg
Okay, I'm spending too much time on this -- but it's interesting. I ported my code to perl and ran it, and it took 1.2 GB of memory (using a dense representation of arrays of arrays), and about 48 seconds to run Dijkstra's ONE time -- about 80 times slower than my C++ implementation. Perhaps the problem is Perl, or something about the way the data is stored in Perl. I also tried a single-array version (indexing with multiplication offsets), but it seemed no better (same memory use, slower run time by about 5 seconds).
Greg
Greg, thank you so much for confirming the problems with perl. I just wrote the whole thing again in C++ and oh WOW. 50seconds to find a path for each pair! Also, just for kick I went back to my perl script and forced it to read the weight as "int(weight)". The memory went down and the processing went down to ~2 minutes. I'm guessing some how perl stored my integer as string and is doing string comparison!Thanks again for all your help!
Jason
@Greg, could you email me your C++ implementation (congfoo at gmail dot com)? I would like to play around with it and compare it against Boost. Thanks a bunch!
Jason