tags:

views:

515

answers:

1

Hello,

I have two fonction, do_step_one(i) and do_step_two(i) for i from 0 to N

Currently, I have this (sequential) code ;

for(unsigned int i=0; i<N; i++){
     do_step_one(i);
}

for(unsigned int i=0; i<N; i++){
     do_step_two(i);
}

for any step (one or two), each case of N can be done in any order and in parallel, but any step_two need the end of all the step_one to start (it use step_one results).

I tried the following :

#omp parallel for
for(unsigned int i=0; i<N; i++){
    do_step_one(i);

#omp barrier

    do_step_two(i);
}

But gcc complains

convolve_slices.c:21: warning: barrier region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region.

What do I missunderstand ? How to solve that issue ?

Thanks.

+2  A: 

One problem I see with this code, is that the code does not comply with the spec :)

If you need all do_step_one()'s to end, you'll need something like the following:

#pragma omp parallel for
for(unsigned int i=0; i<N; i++){
     do_step_one(i);
}

#pragma omp parallel for
for(unsigned int i=0; i<N; i++){
     do_step_two(i);
}

The result of this would be a parallelism of the first for, and then a parallelism of the second for.

Anna
I have forgotten most of my OMP work - does this method still maintain the threads or does it need to recreate them for the second `parallel for`?
Gavin Miller
I'm not sure about that. It is a matter of inner implementation. Formally, it can create the threads again for the second loop, but I think that they might have optimizations avoiding that.
Anna
Oh, I think that I understand now where you're getting at - the first for will have to end for the second one to start, since the parallelism is made per block.To be extra certain (it will probably have no effect), it is possible to place a barrier between the two loops.
Anna
The end of a parallel for has an implicit block on it. That is, the master thread waits for all threads to complete. I'm just thinking that this version would incur the overhead of creating the threads a second time _if_ OMP doesn't maintain the threads. Whether that makes a difference depends upon the implementation of `do_step_one()` and `do_step_two()`
Gavin Miller
You're right, but my guess is that they have a smart implementation, that doesn't just kill and re-create threads, but re-use them in the future. Anyway, its easy to check it when running the program (for example by using top on linux).
Anna
This solution is right, thanks. By the way I need the possibilities to disable the implicit barrier at the end of the first parallel for (it's for learning purpose, I want to show to my student why they need synchronisation between threads). Thank.
Guillaume Bouchard
You can go with a different approach - make a loop to go 2N times, doing step one for the first n, and step 2 for n+1 ... 2n.Parallelise this for like in the example above.
Anna