views:

246

answers:

2

I'm fairly new to OpenMP and I'm trying to start an individual thread to process each item in a 2D array.

So essentially, this:

for (i = 0; i < dimension; i++) {
    for (int j = 0; j < dimension; j++) {
        a[i][j] = b[i][j] + c[i][j];

What I'm doing is this:

#pragma omp parallel for shared(a,b,c) private(i,j) reduction(+:diff) schedule(dynamic)
    for (i = 0; i < dimension; i++) {
        for (int j = 0; j < dimension; j++) {
            a[i][j] = b[i][j] + c[i][j];

Does this in fact start a thread for each 2D item or no? How would I test that? If it is wrong, what is the correct way to do it? Thanks!

Note: The code has been greatly simplified

+5  A: 

Only the outer loop is parallel in your code sample. You can test by printing omp_get_thread_num() in the inner loop and you will see that, for a given i, the thread num is the same (of course, this test is demonstrative rather than definitive since different runs will give different results). For example, with:

#include <stdio.h>
#include <omp.h>
#define dimension 4

int main() {
    #pragma omp parallel for
    for (int i = 0; i < dimension; i++)
        for (int j = 0; j < dimension; j++)
            printf("i=%d, j=%d, thread = %d\n", i, j, omp_get_thread_num());
    }

I get:

i=1, j=0, thread = 1
i=3, j=0, thread = 3
i=2, j=0, thread = 2
i=0, j=0, thread = 0
i=1, j=1, thread = 1
i=3, j=1, thread = 3
i=2, j=1, thread = 2
i=0, j=1, thread = 0
i=1, j=2, thread = 1
i=3, j=2, thread = 3
i=2, j=2, thread = 2
i=0, j=2, thread = 0
i=1, j=3, thread = 1
i=3, j=3, thread = 3
i=2, j=3, thread = 2
i=0, j=3, thread = 0

As for the rest of your code, you might want to put more details in a new question (it's difficult to tell from the small sample), but for example, you can't put private(j) when j is only declared later. It is automatically private in my example above. I guess diff is a variable that we can't see in the sample. Also, the loop variable i is automatically private (from the version 2.5 spec - same in the 3.0 spec)

The loop iteration variable in the for-loop of a for or parallel for construct is private in that construct.

Edit: All of the above is correct for the code that you and I have shown, but you may be interested in the following. For OpenMP Version 3.0 (available in e.g. gcc version 4.4, but not version 4.3) there is a collapse clause where you could write the code as you have, but with #pragma omp parallel for collapse (2) to parallelize both for loops (see the spec).

Edit: OK, I downloaded gcc 4.5.0 and ran the above code, but using collapse (2) to get the following output, showing the inner loop now parallelized:

i=0, j=0, thread = 0
i=0, j=2, thread = 1
i=1, j=0, thread = 2
i=2, j=0, thread = 4
i=0, j=1, thread = 0
i=1, j=2, thread = 3
i=3, j=0, thread = 6
i=2, j=2, thread = 5
i=3, j=2, thread = 7
i=0, j=3, thread = 1
i=1, j=1, thread = 2
i=2, j=1, thread = 4
i=1, j=3, thread = 3
i=3, j=1, thread = 6
i=2, j=3, thread = 5
i=3, j=3, thread = 7

Comments here (search for "Workarounds") are also relevant for work-arounds in version 2.5 if you want to parallelize both loops, but the version 2.5 spec cited above is quite explicit (see the non-conforming examples in section A.35).

Ramashalanka
Thanks, collapse was the trick I was looking for!
achinda99
A: 

You can try of using nested omp parallel fors (after omp_set_nested(1) call), but they a not supported on all openmp implementations.

So I guess to make some 2D grid and start all thread on grid from single for (example for fixed 4x4 thread grid):

#pragma omp parallel for
for(k = 0; k < 16; k++)
{
    int i,j,i_min,j_min,i_max,j_max;
    i_min=(k/4) * (dimension/4);
    i_max=(k/4 + 1) * (dimension/4);
    j_min=(k%4) * (dimension/4);
    j_max=(k%4 + 1) * (dimension/4);

    for(i=i_min;i<i_max;i++)
      for(j=j_min;j<j_max;j++)
       f(i,j);

}
osgx