As you've seen, barrier
won't work; critical
is rather heavy-weight for this particular operation. Atomic is lighter weight than critical; you could always do
if (j >= max_j)
{
#pragma omp atomic
data[j] += data[j-max_j];
}
but you should always be wary of having any such construct (atomic, critical) inside a loop -- it kills performance, because it kills parallelism (that is, after all, their entire purpose).
It would help to know what you're trying to accomplish with this bit of code, because even once the data races in the updates are eliminated, the final result in (say) data[maxints-1] will depend on what order data[maxints-1-max_j],data[maxints-1-2*max_j].. were updated in, which is explicitly not guaranteed by OpenMPs parallel for. (You can use the ordered construct, but that's barely better than not using a parallel for at all).
If maxints < 2*max_j
, then this is easy; you can just do
#pragma omp parallel for shared(data)
for (j = max_j; j < numints; j++){
data[j] += data[j-max_j];
}
and you don't need any synchronization at all, because every thread is only updating one data[j] and none depend on any others. But I get the impression (a) that it isn't, and (b) this is a snippet of a larger piece of code...