views:

498

answers:

1

Hi! I'm trying to implement the listranking problem (known also by shortcutting) with omp to have the sums prefixes of the array W. I don't know if i use correctly the flush pragma.. And i have a warning when compiling "barrier region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region"

#include <stdio.h> 
#include <stdlib.h>
#include <math.h>
#include <omp.h>

main(int argc, char *argv[])
{ 
  int Q[9]={1,2,3,4,5,6,7,8,0};
  int W[8]={1,2,3,4,5,6,7,8};
  int i,j=6,id;

  printf("Before:\n");
  for(j=0;j<8;j++)
  printf("%d",W[j]);
  printf("\n");
  #pragma omp parallel for shared(Q,W) private(id) num_threads(7)
  for (i=6; i>=0; i--)
  {
    id= omp_get_thread_num();
    while((Q[i] !=0)&& (Q[Q[i]] !=0))
    { 
      #pragma omp flush(W)

       W[i]=W[i]+W[Q[i]];

      #pragma omp flush(W)

       printf("Am %d \t W[%d]= %d",id,i,W[i]);

     #pragma omp barrier    
     #pragma omp flush(Q)
     Q[i]=Q[Q[i]];
     #pragma omp flush(Q)
     printf("Am %d \n Q[%d]= %d",id,i,Q[i]);
   };
 }
  printf("Result:\n");
  for(j=0; j<8; j++)
   printf("%d \t",W[j]);
   printf("\n");

}

PLEAAAAAAAAAAAASE HELP!

A: 

You can't use a barrier inside an omp parallel for, you can pretty much only use a barrier inside an omp parallel region.

The reason for this is because if your loop is from 1 to N, a barrier inside will effectively create N threads which will have a negative perf impact if N is large.

I didn't lookup the algorithm here, but two reasonable choices are to refactor to use 2 parallel for loops one after the other where the barrier is, or to refactor your algorithm to use a #pragma parallel region.

I looked up the list ranking algorithm, you will be well served to find an implementation of prefix sum or scan if you must use openmp.

-Rick

Rick
Thanks Rick i noticed that is no need to use a barrier into the loop. I put #pragma omp flush(Q) and #pragma omp flush(W) after modifing the values of Q[i] and W[i] but i still having a problem there when another thread make a flush before the other changes his Q[i] value.
maya
maya if you are on windows, take a look at the Parallel Pattern Library in VS2010, we implemented prefix scan as a sample at code.msdn.com/concrtextras. If you aren't on windows Intel's threading building blocks has scan implemented. You can also take a look at the sample and replace parallel_for with an openmp for.
Rick