Hi, just curious to know which CPU architectures support compare and swap atomic primitives?
Sparc v9 has a cas instruction. The SPARC v9 architecture manual discusses the use of the CAS instruction in Annex J, look specifically at examples J.11 and J.12.
I believe the name of the instruction is actually "casa", because it can access either the current address space or an alternate. "cas" is an assembler macro which accesses the current ASI.
There is also an article on developers.sun.com discussing the various atomic instructions which Sparc processors have implemented over the years, including cas.
Intel x86 has this support. IBM in it's Solaris to Linux Porting Guide gives this example:
bool_t My_CompareAndSwap(IN int *ptr, IN int old, IN int new)
{
unsigned char ret;
/* Note that sete sets a 'byte' not the word */
__asm__ __volatile__ (
" lock\n"
" cmpxchgl %2,%1\n"
" sete %0\n"
: "=q" (ret), "=m" (*ptr)
: "r" (new), "m" (*ptr), "a" (old)
: "memory");
return ret;
}
Powerpc has more powerful primitives available: "lwarx" and "stwcx"
lwarx loads a value from memory but remembers the location. Any other thread or cpu that touches that location will cause the "stwcx", a conditional store instruction, to fail.
So the lwarx /stwcx combo allows you to implement atomic increment / decrement, compare and swap, and more powerful atomic operations like "atomic increment circular buffer index"
--jeffk++
Starting with the ARMv6 architecture ARM has the LDREX/STREX instructions that can be used to implement an atomic compare-exchange operation.
Just to complete the list, MIPS has Load Linked (ll) and Store Conditional (sc) instructions which load a value from memory and later conditionally store if no other CPU has accessed the location. Its true that you can use these instructions to perform swap, increment, and other operations. However the disadvantage is that with a large number of CPUs exercising locks very heavily you get into livelock: the conditional store will frequently fail and necessitate another loop to try again, which will fail, etc.
The software mutex_lock implementation can become very complicated trying to implement an exponential backoff if these situations are considered important enough to worry about. In one system I worked on with 128 cores, they were.
A different and easier way to answer this question may be to list multiprocessor platforms that do NOT support a compare and swap (or a load-link/store-conditional that can be used to write one).
The only one I know of is PARISC, which only has an atomic clear word instruction. This can be used to construct a mutex (provided one aligns the word on a 16 byte boundary). There is no CAS on this archetecture (unlike x86, ia64, ppc, sparc, mips, s390, ...)
Compare and swap was added to IBM mainframes in 1973. It (and compare double and swap) are still on the IBM mainframes (along with more recent multi-processor functions like PLO - perform locked operation).