Atomic load/store for OSs other than BSD?

Among the atomic operations provided by BSD (as given on the atomic(9) man page), there are atomic_load_acq_int() and atomic_store_rel_int(). In looking for the equivalent for other OSs (for example, by reading the atomic(3) man page for Mac OS X, the atomic_ops(3C) man page for Solaris, and the Interlocked*() functions for Windows), there don't seem to be any (obvious) equivalents for just atomically reading/writing an int.

Is this because that it's implied for those OSs that reads/writes for int are guaranteed to be atomic by default? (Or must you use declare them volatile in C/C++?)

If not, then how does one do atomic reads/writes of an int on those OSs?

(Atomic reads can be simulated by returning the result of an atomic add of 0, but there's no equivalent for doing atomic writes.)

I think you are mixing together atomic memory access with cache coherence. The former is the required hardware support for building synchronization primitives in software (spin-locks, semaphores, and mutexes), while the latter is the hardware support for multiple chips (several CPUs, and peripheral devices) working over the same bus, and having consistent view of the main memory.

Different compilers/libraries provide different utilities for the first. Here's, for example, GCC intrinsics for atomic memory access. They all boil down to generating either compare-and-swap or load-linked/store-conditional based instruction blocks depending on the platform support. Compile your source with, say, -S for GCC and see the assembler generated.

You don't have to do anything explicitly for cache coherency - it's all handled in hardware - but it definitely helps to understand how it works to avoid things like cache line ping-pong.

With all that, single word reads and writes are atomic on all commodity platforms (somebody correct me if I'm wrong here). Since ints are less or equal to processor word in size, you are covered (see the GCC builtins link above).

It's the order of reads and writes that is important. Here's where architecture memory model is important. It dictates what operations can and cannot be re-ordered by the hardware. Example would be updating a linked list - you don't want other CPUs see a new item linked until the item itself is in consistent state. Explicit memory barriers (also often called "memory fences") might be required. Acquire barrier ensures that subsequent operations are not re-ordererd before the barrier (say you read the linked-list item pointer before the content of the item), Release barrier ensures that previous operations are not re-ordered after the barrier (you write the item content before writing the new link pointer).

volatile is often misunderstood as being related to all the above. In fact it is just an instruction to the compiler not to cache variable value in register, but read it from memory on each access. Many argue that it's "almost useless" for concurrent programming.

Apologies for lengthy reply. Hope this clears it a bit.

Edit:

Upcoming C++0x standard finally addresses concurrency, see Hans Boehm's C++ memory model papers for many details.

ansaurus

tags:

views:

answers:

Atomic load/store for OSs other than BSD?

Edit:

related questions