views:

34

answers:

1

The atomic directive in openmp supports stuff like

x += expr
x *= expr

where expr is an expression of scalar type that does not reference x. I get that, but I don't get why you can't do:

#pragma omp atomic
x = y;

Is this somehow more taxing cpu instruction-wise? Seems to me that both the legal and illegal statement loads the value of x and some other scalar value, changes the register value of x and writes it back. If anyone could explain to me how these instructions are (I assume) fundamentally different I would be very grateful.

+1  A: 

Because the suggested atomic assignment does not protect against anything.

Remember that an atomic instruction can be thought of as a critical section that could be (but does not have to be) efficiently implemented by the compiler by using magic hardware. Think about two threads reaching x = y with shared x and private y. After all the threads finish, x is equal to the last thread to execute "wins" and sets x to its y. Wrap the assignment in a critical section and nothing changes, the last thread still "wins". Now, if the threads do something else with x afterwards the slowest thread may not have caught up and even if it has the compiler could legitimately end up using choosing to some cached value for x (i.e. the thread's local y). To avoid this, you would need a barrier (so the winning thread has won) and its implied flush (so the local cache has been invalidated):

x = y;
#pragma omp barrier
\\ do something with shared x...

but I cannot think of a good reason to do this. Why do all the work to find y on many threads if most of them will be (non-deterministically) thrown away?

Andrew Walker
Atomic assignment *is* useful. If the type of the value of X and Y is "large" in any sense, doing an atomic copy ensures that the resulting value X doesn't contain an inconsistent picture of any value Y that might be copied. You also need, of course, "atomic read" (you suggested a barrier) to make sure that fetching a component of X doesn't get you a part of one value, as your race situtation above describes. What I would have said is that the overhead for protecting individual copies is in general pretty high, and this might not be that helpful from a performance point of view.
Ira Baxter