I need to pass an array of integer arrays (basically a 2 d array )to all the processors from root.I am using MPI in C programs. How to declare MPI datatype for 2 d array.and how to send the message (should i use broadcast or scatter)
If you are sending a contiguous block of data (I think C arrays are contiguous, but I'm a Fortran programmer and am not terribly sure) you don't need to declare a new MPI datatype, though there are some reasons why you might want to. Scattering is for distributing, say, an array across a number of processes; you might use scatter to send each row of an array to a different process. So for your example of a contiguous array of integers your simplest option is to broadcast, like this (bearing in mind my poor C skills):
MPI_Bcast(&buf, numRows*numCols, MPI_INT, root, MPI_COMM_WORLD)
where
&buf
is the address of the first element in the array
numRows*numCols
is, of course, the number of elements in the 2D array
MPI_INT
is (probably) the intrinsic datatype you will be using
root
is the rank of the process which is broadcasting the array
MPI_COMM_WORLD
is the usual default communicator, change if required
And don't forget that broadcasting is a collective operation, all processes make the same call.
If your array is not contiguous, post again with some sample array sizes, and we'll figure out how to define an MPI datatype.
Your array of arrays can't be passed directly to another process, because virtual addresses might be different; that is, the first dimension array with the pointers to the other arrays won't make sense on any other process. So you have to pass each array separately, and manually reassemble your "2d array" on the receiver side.
2) Broadcast vs. Scatter. Broadcast sends the complete array to all other MPI ranks in the communicator. Scatter, OTOH, distributes the source array over all the other MPI ranks. I.e. with broadcast each rank receives a copy of the source array, with scatter each rank receives a different part of the array.
You'll need to use Broadcast, because you want to send a copy of the same message to every process. Scatter breaks up a message and distributes the chunks between processes.
As for how to send the data: the HIndexed datatype is for you.
Suppose your 2d array is defined like this:
int N; // number of arrays (first dimension)
int sizes[N]; // number of elements in each array (second dimensions)
int* arrays[N]; // pointers to the start of each array
First you have to calculate the displacement of each array's starting address, relative to the starting address of the datatype, which can be the starting address of the first array to make things convenient:
MPI_Aint base;
MPI_Address(arrays[0], &base);
MPI_Aint* displacements = new int[N];
for (int i=0; i<N; ++i)
{
MPI_Address(arrays[i], &displacements[i]);
displacements[i] -= base;
}
Then the definition for your type would be:
MPI_Datatype newType;
MPI_Type_hindexed(N, sizes, displacements, MPI_INTEGER, &newType);
MPI_Type_commit(&newType);
This definition will create a datatype that contains all your arrays packed one after the other. Once this is done, you just send your data as a single object of this type:
MPI_Bcast(arrays, 1, newType, root, comm); // 'root' and 'comm' is whatever you need
However, you're not done yet. The receiving processes will need to know the sizes of the arrays you're sending: if that knowledge isn't available at compile time, you'll have to send a separate message with that data first (simple array of ints). If N
, sizes
and arrays
are defined similar as above on the receiving processes, with enough space allocated to fill the arrays, then all the receiving processes need to do is define the same datatype (exact same code as the sender), and then receive the sender's message as a single instance of that type:
MPI_Bcast(arrays, 1, newType, root, comm); // 'root' and 'comm' must have the same value as in the sender's code
And voilá! All processes now have a copy of your array.
Of course, things get a lot easier if the 2nd dimension of your 2d array is fixed to some value M
. In that case, the easiest solution is to simply store it in a single int[N*M]
array: C++ will guarantee that it's all contiguous memory, so you can broadcast it without defining a custom datatype, like this:
MPI_Bcast(arrays, N*M, MPI_INTEGER, root, comm);
Note: you might get away with using the Indexed type instead of HIndexed. The difference is that in Indexed, the displacements
array is given in number of elements, while in HIndexed it's the number of bytes (H stands for Heterogenous). If you were to use Indexed, then the values given in displacements
would have to be divided by sizeof(int)
. However, I'm not sure if integer arrays defined in arbitrary positions on the heap are guaranteed to "line up" to integer limits in C++, and in any case, the HIndexed version has (marginally) less code and produces the same result.