views:

206

answers:

5

I want to write a code converter that takes an OpenMP based parallel program and runs it on a cluster.

How do I go about this problem? What libraries do I use? How do I set up a small cluster for this?

I'm finding it extremely hard to find good material about cluster computing on the internet.

EDIT: If it's impossible then how does Intel do it? The Intel compiler seems to do exactly what I want to. I don't have any specific application that I would like to run. I want to write the "converter/compiler", not the application. I understand that shared memory is different from distributed memory, but there has to be a way to sync memory, if not for all cases, then for some specific cases, even if it means that application is written with custom constructs.

+1  A: 

This is simply not possible. You have to structure your code in a completely different way to get it to work on a cluster (programming multiple machines is very different from programming one machine).

There is no magic pixie dust to do this.

On the other hand, if you write your program with clusters in mind, it is possible to run it on a single machine (although it will obviously be slower).

Zifre
+1  A: 

Intel has an implementation of OpenMP that works with their C++ and Fortran compilers for x86 64-bit clusters. You can get a 30-day eval version of these compilers for free. Other than that, Zifre is mostly right. If you are concerned with scalability, bite the bullet and write your parallel program in another programming model (MPI, CUDA, Cilk, ...) which is designed with distributed systems in mind. If you provide a little more information about your application, we may be able to provide more useful guidance on that front.

Matt J
A: 

What if I just want to convert the main code into parts that can be run on separate nodes of a cluster? Maybe just something like a loop division /optimisation.. cant that be done?

+2  A: 

It seems to me that this is not a good idea.

The basic idea behind OpenMP is data-shared parallel execution. It works well, when accessing shared data costs you nothing. Every thread can access a variable in shared cache or RAM.

The cluster computations exploit message-passing, because computers in cluster have distributed memory. When one process needs data from another one then you should manage data passing over the network. It is time-consuming operation.

So, if you want to write such compiler, you should implement data broadcasting operations (e.g. MPI_Bcast from MPI) for each data access in OpenMP. This will kill parallel performance at all.

Vova
+1  A: 

SCORE/SCASH and Omni OpenMP compiler