views:

169

answers:

1

Hi,

I have a situation like:

#pragma omp parallel for private(i, j, k, val, p, l)

for (i = 0; i < num1; i++)  
    {  
   for (j = 0; j < num2; j++)  
       {
         for (k = 0; k < num3; k++)     
             {  
              val = m[i + j*somenum + k*2]  

              if (val != 0)  
              for (l = start; l <= end; l++)  
                  {  
                    someFunctionThatWritesIntoGlobalArray((i + l), j, k, (someFunctionThatGetsValueFromAnotherArray((i + l), j, k) * val));  
                  }

              }  
         }  

        for (p = 0; p < num4; p++)  
            {  
               m[p] = 0;  
            }    
      }

Thanks for reading, phew! Well I am noticing a very minor difference in the results (0.999967[omp] against 1[serial]), when I use the above (which is 3 times faster) against the serial implementation. Now I know I am doing a mistake here...especially the connection between loops is evident. Is it possible to parallelize this using omp sections? I tried some options like making shared(p) {doing this, I got correct values, as in the serial form}, but there was no speedup then.

Any general advice on handling openmp pragmas over a slew of for loops would also be great for me!

+1  A: 

This is really a restatement or refinement of your earlier question, it would help SOers if you had edited that rather than asking a 'new' question. Still ...

As you've written your code OpenMP will parcel out the iterations of your outermost loop, the one controlled by the statement

for (i = 0; i < num1; i++)

to the available threads. So, using the default loop schedule, if you have 4 threads each of them will execute 1/4 of the iterations. This will probably mean that thread 0 runs iterations i = 0,1,2,... and thread 1 runs iterations (num1/4)+1, (num1/4)+2,... and so on. If you are a beginner at OpenMP programming you really must investigate how loop iterations are spread across threads for yourself. You must also investigate the effects of modifying the loop scheduling. This is an essential part of learning about parallel programming.

OpenMP will then execute the inner loops on each thread, so each thread will execute the loops controlled by the variables j,k,l,p. These will not be further parallelised, your program does not implement dynamic thread management.

One consequence of this is that all threads will update the array m for all values of p. This does not look sensible to me.

You write that there is a discrepancy between the results of the serial implementation and of the parallel implementation. But you do not specify what result is different. Which variable has a different value at the end of the loops ? In general you should not expect exact equality of floating-point results on serial and parallel programs since the order of execution of f-p arithmetic is important. f-p arithmetic is not truly commutative, nor is it truly associative or truly distributive. Even the simple operation of adding numbers together cannot, in the general case, be guaranteed to be the same for serial and parallel executions of the same program.

However, without knowing how the result you report is computed, it's utterly impossible to say why you get the difference. It could be 'normal' behaviour, it could be an error.

High Performance Mark
nicely said. +1
aaa
I apologize for the telltale holes and creating a separate thread. The program is a pretty huge one, and I am only dealing with a part, and I agree that I lost specificity whilst trying to simplify it. But I reckon your description gives me enough cue to start digging deep into the program (and OpenMP) since I really do not have a thorough understanding of the code behavior w.r.t OpenMP.ThanksSayan
Sayan Ghosh
As you said --- "One consequence of this is that all threads will update the array m for all values of p. This does not look sensible to me." I did enclose the section with a #pragma omp critical, which made the program run time almost nearing the serial version.
Sayan Ghosh