On x86, it will turn into a lock prefixed assembly instruction, like LOCK XADD
.
Being a single instruction, it is non-interruptible. As an added "feauture", the lock prefix results in a full memory barrier:
"...locked operations serialize all outstanding load and store operations (that is, wait for them to complete)." ..."Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchronize data written by one processor and read by another processor." - Intel® 64 and IA-32 Architectures Software Developer’s Manual, Chapter 8.1.2.
A memory barrier is in fact implemented as a dummy LOCK OR
or LOCK AND
in both the .NET and the JAVA JIT on x86/x64.
So you have a full fence on x86 as an added bonus, whether you like it or not. :)
On PPC, it is different. An LL/SC pair - lwarx
& stwcx
- with a subtraction inside can be used to load the memory operand into a register, subtract one, then either write it back if there was no other store to the target location, or retry the whole loop if there was. An LL/SC can be interrupted.
It also does not mean an automatic full fence.
This does not however compromise the atomicity of the counter in any way.
It just means that in the x86 case, you happen to get a fence as well, "for free".
On PPC, one can insert a full fence by emitting a (lw)sync
instruction.
All in all, explicit memory barriers are not necessary for the atomic counter to work properly.