views:

395

answers:

2

I would like to implement a parallel version of the code below using threads in OpenMP,is there any better way to do this?

/* Program to compute Pi using Monte Carlo methods */

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>
#define SEED 35791246

int main(int argc, char* argv)
{
   int niter=0;
   double x,y;
   int i,count=0; /* # of points in the 1st quadrant of unit circle */
   double z;
   double pi;
   clock_t end_time, start_time; 


   printf("Enter the number of iterations used to estimate pi: ");
   scanf("%d",&niter);

   start_time = clock(); 
   /* initialize random numbers */
   srand(SEED);
   count=0;

 #pragma omp parallel for
      for ( i=0; i<niter; i++) {
      x = (double)rand()/RAND_MAX;
      y = (double)rand()/RAND_MAX;
      z = x*x+y*y;
      if (z<=1) count++;
    }  
#pragma omp task
   pi=(double)count/niter*4;
#pragma omp barrier

   end_time = clock();

   printf("# of trials= %d , estimate of pi is %g, time= %f \n",niter,pi, difftime(end_time, start_time));

   return 0;
}
+2  A: 

The rand function is not a good idea to use here. Either it is not threadsafe and you will have threads getting duplicate values (not very random), or it will have a lock and the MP version will be slower than the single-thread version.

I would recommend finding another random number generator that can be used simultaneously from multiple threads.

Gabe
That's not exactly true -- as stated in comments in http://www.thinkingparallel.com/2006/08/21/scoped-locking-vs-critical-in-openmp-a-personal-shootout/ (see second comment), functions such as rand() provided by the base language are required by OpenMP to be thread-safe. However, you are likely to be right about duplicate values or slowness from internal locking, as these are not "thread safety" issues as that word is normally defined. The advice about choosing a better option is wise.
Brooks Moses
`rand` requires a reproducible internal value to be kept. That is pretty much by design and makes it inherently locked or thread unsafe. Additionally in traditional UNIX and BSD systems it uses a very bad pseudo-random number generator. Reading from good PRNGs or RNGs limits speed again with at least kernel locks for file I/O. This is a case where re-inventing the wheel or using 3rd party code might actually be better than using standard library functions.
jbcreix
+2  A: 

It could be improved by correcting some OpenMP bugs. First, since you're summing up (copies of) count in all of the parallel threads, you need to apply a reduction operator at the end of the parallel segment to combine all of those back into a single value. Also, the variables i, x, y, and z need to have individual instances for each parallel thread -- you don't want the threads using the same one! To specify all of that, your #pragma directive at the top of the loop should be:

#pragma omp parallel for private(i, x, y, z) reduction(+:count)

Also, the scope of that is the for loop, so you don't need to do anything else; there will automatically be a synchronization of the threads after the loop exits. (And you need that synchronization to get count to contain all the increments from all threads!) In particular, your task and barrier pragmas are meaningless, as at that point you are back to just one thread -- and, besides, there's no point in putting that single computation in a parallel task.

And there's the issue that gabe raised about the likely slowness and/or poor randomness of the system random number generator in these cases. You will probably want to investigate the particulars of that on your system, and give it a new random seed in each thread or use a different random-number generator depending on what you find.

Besides that, it looks fairly reasonable. Not much else you can do to that algorithm, as it's short and trivially parallelizable.

Brooks Moses
It all makes sense to me now.Am still learning how to parallelise computation using OpenMP.From what Brooks suggested,i did specify #pragma omp parallel for private(i, x, y, z) reduction(+:count) at the top of the loop and run the program again.It still took a longer time to do the computation compared to the serial algorithm.This then takes me to the issue raised by gabe of rand function not being thread friendly.How do i assign a new random seed in each thread or use a different random-number generator in my code to improve performance of the algorithm?
kayn