C++ programming for clusters and HPC

views:

221

answers:

+6 Q:

C++ programming for clusters and HPC

I need to write a scientific application in C++ doing a lot of computations and using a lot of memory. I have part of the job but due to high requirements in terms of resources I was thinking to start moving to OpenMPI.

Before doing that I have a simple curiosity: If I understood the principle of OpenMPI correctly it is the developer that has the task of splitting the jobs over different nodes calling SEND and RECEIVE based on node available at that time.

Do you know if it does exist some library or OS or whatever that has this capability letting my code reamain as it is now? Basically something that connects all computers and let share as one their memory and CPU?

I am a bit confused because of the huge volume of material available on the topic. Should I look at cloud computing? or Distributed Shared Memory?

+1 A:

If message passing is holding you down, try distributed objects. There are a lot of distributed object frameworks available. CORBA, DCOM, ICE to name a few... If you choose to distribute your objects, your objects will have global visibility through the interfaces(both data and methods) you will define. Any object in any node can access these distributed objects.

I have been searching for software that allows distributing memory, but haven't come across any. I guess its because you have all these distributed object frameworks available, and people don't have any need for distributing memory as such.

Sundar 2010-03-30 21:15:37

Great! I heard a lot about these technologies used by some old friend of mine..I will ask and start having a look!

Abruzzo Forte e Gentile 2010-03-31 15:41:43

+2 A:

Currently there is no C++ library or utility that will allow you to automatically parallelize your code across a cluster of machines. Granted that there are a lot of ways to achieve distributed computing with other approaches, you really want to be optimizing your application to use message passing or distributed shared memory.

Your best bets would be to:

Convert your implementation into a task-based solution. There are a lot of ways to do this but this will most definitely done by hand.
Clearly identify where you can break the tasks up and how these tasks essentially communicate with each other.
Use a higher level library that builds on OpenMPI/Mpich -- Boost.MPI comes to mind.

Implementing a parallel distributed solution is one thing, making it work efficiently is another though. Read up on different topologies and different parallel computing patterns to make implementing solutions a little less painful than if you had to start from scratch.

Dean Michael 2010-03-31 09:58:04

HI! Many Thanks for your response.

Abruzzo Forte e Gentile 2010-03-31 15:39:54

+1 A:

I had a good experience using Top-C in graduate school.

From the home page: "TOP-C especially distinguishes itself as a package to easily parallelize existing sequential applications."

http://www.ccs.neu.edu/home/gene/topc.html

Edit: I should add, it's much simpler to parallelize a program if it uses "trivial parallelism". e.g. Nodes don't need to share memory. Mapreduce is built on this concept. If you can minimize the amount of shared state your nodes use, you'll see orders of magnitude better improvements from parallel processing.

Stephen 2010-03-31 14:27:37

HI! Many Thanks for your response..I didn't know about this project! I will have a loook!

Abruzzo Forte e Gentile 2010-03-31 15:40:59

+2 A:

Well, you haven't actually stated exactly what the hardware you are targetting is, if it's a shared-memory machine then OpenMP is an option. Most parallel programmers would regard parallelisation with OpenMP as an easier option than using MPI in any of its incarnations. I'd also suggest that it is easier to retrofit OpenMP to an existing code than MPI. The best, in the sense of best-performing, MPI programs are those designed from the ground up to be parallelised with message-passing. For one thing, the best sequential algorithm does not always the best algorithm for a problem once it has been parallelised. Sometimes a simple, but sequentially-sub-optimal algorithm is a better choice.

You may have access to a shared-memory computer:

all multicore CPUs are effectively shared-memory computers;
on a lot of clusters the nodes are often two or four CPUs strong, if they each have 4 cores then you might have a 16-core shared-memory machine on your cluster;
if you have access to an MPP supercomputer you will probably find that each of its nodes is a shared-memory computer.

If you are stuck with message-passing then I'd strongly advise you to stick with C++ and OpenMPI (or whatever MPI is already installed on your system), and you should definitely look at BoostMPI too. I advise this strongly because, once you step outside the mainstream of high-performance scientific computing, you may find yourself in an army of one programming with an idiosyncratic collection of just-fit-for-research libraries and other tools. C++, OpenMPI and Boost are sufficiently well used that you can regard them as being of 'weapons-grade' or whatever your preferred analogy might be. There's little enough traffic on SO, for example, on MPI and OpenMP, check out the stats on the other technologies before you bet the farm on them.

If you have no experience of MPI then you might want to look at a book called Parallel Scientific Computing in C++ and MPI by Karniadakis and Kirby. Using MPI by Gropp et al is OK as a reference, but it's not a beginner's text on programming for message-passing.

High Performance Mark 2010-04-01 10:02:57

ansaurus

tags:

views:

answers:

C++ programming for clusters and HPC

related questions