tags:

views:

65

answers:

3

I have the following problem:

Program 1 has a huge amount of data, say 10GB. The data in question consists of large integer- and double-arrays. Program 2 has 1..n MPI processes that use tiles of this data to compute results.

How can I send the data from program 1 to the MPI Processes?

Using File I/O is out of question. The compute node has sufficient RAM.

A: 

It should be possible, depending on your MPI implementation, to run several different programs in the same MPI job. For instance, using OpenMPI you can run

 mpirun -n 1 big_program : -n 20 little_program

and you should be able to access both programs using MPI_COMM_WORLD. From there you'd then be able to use the usual MPI functions to pass your data from the big program to the little ones.

Scott Wales
I do this and it works great (using Microsoft's msmpi). You may want to make use of the MPI_APPNUM variable (accessible in msmpi as the environment variable as PMI_APPNUM) to determine who is who: in the above case big_program would have MPI_APPNUM = 0 and all the little_programs would have MPI_APPNUM = 1.
A: 

One answer might be to have the two programs reside in separate communicators; a single executable could launch both sets of apps by utilizing MPI-2's dynamic process management, and the "producer" program communicate through MPI_COMM_WORLD to the "consumer" application. Subsequently, all IPC for the consumer app would have to run inside a subcommunicator that excluded the producer portion. This would mean rewriting to avoid direct calls to MPI_COMM_WORLD, however.

Matt
A: 

Based on your description "Program 1" is not an MPI application, and "Program 2" is an MPI application. The shortest path to a solution is likely to open a socket between the two programs and send the data that way. This does not require that "Program 1" be modified to be an MPI Program. I would begin with a socket between "Program 1" and " Program 2 : Rank 0", with Rank 0 distributing the data to the remaining ranks.

Several suggestions so far have involved launching a heterogeneous set of executables as one possible solution. There is no requirement that all the ranks in a single MPI job be the same executable. This requires that both executables be "MPI Programs" (e.g. include at least MPI_Init, and MPI_Finalize calls). The level of modification required to "Program 1", and the inability to run it outside of the MPI environment, may make this option unattractive.

I would recommend that you avoid the "dynamic process" approach, unless you are using a commercial implementation that offers support. Support for connect/accept tends to be spotty in the open source implementations of MPI. It may "just work", but getting technical help if it does not can be an open ended problem.

semiuseless