tags:

views:

190

answers:

1

Hi,

I am trying to multiply square matrices in parallele with MPI.

I use a MPI_Type_vector to send square submatrixes (arrays of float) to the processes, so they can calculate subproducts. Then, for the next iterations, these submatrices are send to neighbours processes as MPI_Type_contiguous (the whole submatrix is sent). This part is working as expected, and local results are corrects.

Then, I use MPI_Gather with the contiguous types to send all local results back to the root process. The problem is, the final matrix is build (obviously, by this method) line by line instead of submatrix by submatrix.

I wrote an ugly procedure rearranging the final matrix, but I would like to know if there is a direct way of performing the "inverse" operation of sending MPI_Type_vectors (i.e., sending an array of values and directly arranging it in a subarray form in the receiving array).

An example, to try and clarify my long text :

A[16] and B[16]

Those really are 2D arrays, A[4][4] and B[4][4].

are 4x4 matrices to be multiplied ; C[4][4] will contain the result ; 4 processes are used (Pi with i from 0 to 3) :

Pi gets two 2x2 submatrices : subAi[4] and subBi[4] ; their product is stored locally in subCi[4].

For instance, P0 gets :

subA0[4] containing A[0], A[1], A[4] and A[5] ;
subB0[4] containing B[0], B[1], B[4] and B[5].

After everything is calculed, root process gathers all subCi[4].

Then C[4][4] contains :

[
subC0[0], subC0[1], subC0[2], subC0[3],
subC1[0], subC1[1], subC1[2], subC1[3],
subC2[0], subC2[1], subC2[2], subC2[3],
subC3[0], subC3[1], subC3[2], subC3[3]]

and I would like it to be :

[
subC0[0], subC0[1], subC1[0], subC1[1],
subC0[2], subC0[3], subC1[2], subC1[3],
subC2[0], subC2[1], subC3[0], subC3[1],
subC2[2], subC2[3], subC3[2], subC3[3]]

without further operation. Does someone know a way ?

Thanks for your advices.

Adding information in answer to 'High Performance Mark' :

1 Well, my initial matrices are 2D arrays (in the shape of A[4][4]). I wanted to make it short while writing my question, I see now it was a bad idea...

I did define the MPI_Type_vector as follow, for the example :

MPI_Type_vector(2, 2, 4, MPI_FLOAT, &subMatrix);

(By the way, I can't see any difference for a flattened array).

2 I am not an expert in MPI, far from it, so I might do strange things. Here is a bit of my code, applied to the examples (only A is dealt with, B is quite similar) :

Sending submatrices from root to slave processes :

Master {
    for (i = 0 ; i < 2 ; i++)
        for (j = 0 ; j < 2 ; j++)
            MPI_Send(&A[j * 2][(i + j) % 2 * 2], 1, subMatrix, i + j * 2, 42, MPI_COMM_WORLD);
}

Slaves receive :

MPI_Recv(subA, 4, MPI_FLOAT, 0, 42, MPI_COMM_WORLD, &status);

Then, exchanges between processes are done via MPI_Send and MPI_Recv of subMatrixLocal, which is :

MPI_Type_contiguous(4, MPI_FLOAT, &subMatrixLocal);

After all locals operations are done, I gather all subC matrices into C :

MPI_Gather(subC, 1, subMatrixLocal, C, 1, subMatrixLocal, 0, MPI_COMM_WORLD);

and I obtain the previously stated result, which I have to reorder...

And about your proposed algorithm : the next step will be to do the matrix multiplications with GPUs, where square matrices products are efficient. MPI will be just used to transfer matrices from CPUs to CPUs. Of course, global efficiency will be tested then.

0 You said that "the same type definition should be applicable for the reverse operation". However, my MPI_Vector_type is working fine on the "large" matrix, but using it directly on a sub-matrix isn't possible (applying MPI_Vector_type(2, 2, 4) on a 2x2 matrix will have wrong results, as it will take the last two values "outside" the defined array...). Do you mean I should create another MPI_Vector_type and send/receive it ?

+1  A: 

The answer to your question *is there is a direct way of performing the 'inverse' operation of sending MPI_Type_vectors* is yes. If you have already defined a type vector to send a submatrix from one process to another, the same type definition should be applicable for the reverse operation.

However, I am somewhat confused by your explanation and have some further questions for you. If you answer them I may be able to provide better advice.

  1. You write your matrices as A[16], B[16] and say that they are 4x4. Have you already flattened them ? I expected that they would be A[4][4] etc. If you have flattened the matrices, why have you done so ? You can certainly define an mpi_type_vector to define a sub-matrix of a 2D matrix.
  2. It seems a little odd to me, not necessarily wrong but odd, to be matching sends with gathers. I would normally expect to see gathers matched by scatters and sends by receives. Perhaps you could post enough of your code to clarify what operations you are using.

Finally, I think that multiplying matrices by multiplying sub-matrices is probably not an efficient approach with MPI. If you are doing this as an exercise then carry on. But a better algorithm, and probably one that is easier to implement, would be

  1. mpi_broadcast matrix B to all processes;
  2. director process sends rows of A to worker processes, in a loop;
  3. worker process computes a row of C and sends it back to director process;
  4. director process receives rows of C and puts them in the right location.
High Performance Mark
I have edited my post to try and answer your questions. Thanks for your interest.
Goloneg