tags:

views:

990

answers:

5

I have to implement MPI system in a cluster. If anyone here has any experience with MPI (MPICH/OpenMPI), I'd like to know which is better and how the performance can be boosted on a cluster of x86_64 boxes.

+3  A: 

MPICH has been around a lot longer. It's extremely portable and you'll find years worth of tips and tricks online. It's a safe bet and it's probably compatible with more MPI programs out there.

OpenMPI is newer. While it's not quite as portable, it supports the most common platforms really well. Most people seem to think it's a lot better in several regards, especially for fault-tolerance - but to take advantage of this you may have to use some of its special features that aren't part of the MPI standard.

As for performance, it depends a lot on the application; it's hard to give general advice. You should post a specific question about the type of calculation you want to run, the number of nodes, and the type of hardware - including what type of network hardware you're using.

dmazzoni
A: 

We used mpich simply because it seemed most available and best documented, we didn't put a lot of effort into testing alternatives. MPICH has reasonable tools for deployment on windows.
The main performance issue we had was that we needed to ship the same base data to all nodes and MPICH doesn't (or didn't) support broadcast - so deploying the initial data was O(n)

Martin Beckett
+4  A: 

I've written quite a few parallel applications for both Windows and Linux clusters, and I can advise you that right now MPICH2 is probably the safer choice. It is, as the other responder mentions, a very mature library. Also, there is ample broadcasting support (via MPI_Bcast) now, and in fact, MPICH2 has quite a few really nice features like scatter-and-gather.

OpenMPI is gaining some ground though. Penguin computing (they're a big cluster vendor, and they like Linux) actually has some really strong benchmarks where OpenMPI beats MPICH2 hands down in certain circumstances.

Regarding your comment about "boosting performance", the best piece of advice I can give is to never send more data than absolutely necessary if you're I/O bound, and never do more work than necessary if you're CPU bound. I've fallen into the trap of optimizing the wrong piece of code more than once :) Hopefully you won't follow in my footsteps!

Check out the MPI forums - they have a lot of good info about MPI routines, and the Beowulf site has a lot of interesting questions answered.

Mike
A: 

'Better' is hard to define... 'Faster' can be answered by benchmarking it with your code, and your hardware. Things like collective & offload optimisation will depend on your exact hardware and is also quite variable with regards to driver stack versions, google should be able to find you working combinations.

As far as optimisation work, that somewhat depends on the code, and somewhat on the hardware.

Is your code I/O bound to storage? In which case investigation something better than NFS might help a lot, or using MPI I/O rather than naive parallel I/O

If you are network bound, then looking at communication locality, and comms/compute overlap can help. Most of the various MPI implementations have tuning options for using local shared memory rather than the network for intranode comms, which for some codes can reduce the network load significantly.

Segregation of I/O and MPI traffic can make a big difference on some clusters, particularly for gigabit ethernet clusters.

Funcan
A: 

mpich2 is the best

osgx