tags:

views:

237

answers:

4

Hi,

(1). I am wondering how I can speed up the time-consuming computation in the loop of my code below using MPI?

 int main(int argc, char ** argv)   
 {   
 // some operations           
 f(size);           
 // some operations         
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations          
 int i;           
 double * array =  new double [size];           
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }           
 // some operations using all elements in array           
 delete [] array;  
 }

As shown in the code, I want to do some operations before and after the part to be paralleled with MPI, but I don't know how to specify where the parallel part begins and ends.

(2) My current code is using OpenMP to speed up the comutation.

 void f(int size)   
 {   
 // some operations           
 int i;           
 double * array =  new double [size];   
 omp_set_num_threads(_nb_threads);  
 #pragma omp parallel shared(array) private(i)  
 {
 #pragma omp for schedule(dynamic) nowait          
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }          
 } 
 // some operations using all elements in array           
 }

I wonder if I change to use MPI, is it possible to have the code written both for OpenMP and MPI? If it is possible, how to write the code and how to compile and run the code?

(3) Our cluster has three versions of MPI: mvapich-1.0.1, mvapich2-1.0.3, openmpi-1.2.6. Are their usage same? Especially in my case. Which one is best for me to use?

Thanks and regards!


UPDATE:

I like to explain a bit more about my question about how to specify the start and end of the parallel part. In the following toy code, I want to limit the parallel part within function f():

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int main(int argc, char **argv)  
{  
printf("%s\n", "Start running!");  
f();  
printf("%s\n", "End running!");  
return 0;  
}  


void f()  
{  
char idstr[32]; char buff[128];  
int numprocs; int myid; int i;  
MPI_Status stat;  

printf("Entering function f().\n");

MPI_Init(NULL, NULL);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{  
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d ", myid);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

printf("Leaving function f().\n");  
}  

However, the running output is not expected. The printf parts before and after the parallel part have been executed by every process, not just the main process:

$ mpirun -np 3 ex2  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
WE have 3 processors  
Hello 1 Processor 1 reporting for duty  

Hello 2 Processor 2 reporting for duty  

Leaving function f().  
End running!  
Leaving function f().  
End running!  
Leaving function f().  
End running!  

So it seems to me the parallel part is not limited between MPI_Init() and MPI_Finalize().

Besides this one, I am still hoping someone could answer my other questions. Thanks!

+1  A: 

If all the values in the array are independent, then it should be trivially parallelizable. Split the array into chunks of roughly equal size, give each chunk to a node, and then compile the results back together.

Kyle Butt
Thanks Kyle! Could you give sample code for my case please? I want to do some operations, especially on the array, before and after the parallel part using MPI, but I don't know how to specify where the parallel part begins and ends. Also I have added more to my questions.
Tim
+1  A: 

Hello.

The MPI_Init (with args of &argc and &argv. It is the requirement of MPI implementations) must be really the first executed statement of MAIN. And Finalize must be the very last executed statement.

main() will be started on every node in MPI environment. In argc and argv can be parameters like number of nodes, node_id, and master node address.

It is framework:

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int numprocs; int myid; 

int main(int argc, char **argv)  
{  

MPI_Init(&argc, &argv);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  /* main process. user interaction is ONLY HERE */

    printf("%s\n", "Start running!");  

    MPI_Send ... requests with job
    /*may be call f in main too*/
    MPU_Reqv ... results..
    printf("%s\n", "End running!");  
}
else
{

  /* Slaves. Do sit here and wait a job from main process */
  MPI_Recv(.input..);  
  /* dispatch input by parsing it 
    (if there can be different types of work)
    or just do the work */    
  f(..)
  MPI_Send(.results..);  
}

MPI_Finalize();  

return 0;  
}  
osgx
+1  A: 

The easiest migration to cluster form OpenMP can be "Cluster OpenMP" from intel.

For MPI you need to completely rewrite dispatching of work.

osgx
+1  A: 

Quick edit (because I either can't figure out how to leave comments, or I'm not allowed to leave comments yet) -- 3lectrologos is incorrect about the parallel part of MPI programs. You cannot do serial work before MPI_Init and after MPI_Finalize and expect it to actually be serial -- it will still be executed by all MPI threads.

I think part of the issue is that the "parallel part" of an MPI program is the entire program. MPI will start executing the same program (your main function) on each node you specify at approximately the same time. The MPI_Init call just sets certain things up for the program so it can use the MPI calls correctly.

The correct "template" (in pseudo-code) for what I think you want to do would be:

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);  
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    if (myid == 0) { // Do the serial part on a single MPI thread
        printf("Performing serial computation on cpu %d\n", myid);
        PreParallelWork();
    }

    ParallelWork();  // Every MPI thread will run the parallel work

    if (myid == 0) { // Do the final serial part on a single MPI thread
        printf("Performing the final serial computation on cpu %d\n", myid);
        PostParallelWork();
    }

    MPI_Finalize();  
    return 0;  
}  
J Teller