views:

2460

answers:

8

Hi, just curious to know which CPU architectures support compare and swap atomic primitives?

+2  A: 

Sparc v9 has a cas instruction. The SPARC v9 architecture manual discusses the use of the CAS instruction in Annex J, look specifically at examples J.11 and J.12.

I believe the name of the instruction is actually "casa", because it can access either the current address space or an alternate. "cas" is an assembler macro which accesses the current ASI.

There is also an article on developers.sun.com discussing the various atomic instructions which Sparc processors have implemented over the years, including cas.

DGentry
What is it? Can you give a link?
ceretullis
I edited the answer to provide more details and links.
DGentry
Note though that x86 has double word CAS and the other non-SPARC CPUs have ll/cs - both of which solve ABA with a counter. Single word CAS does not permit solving ABA with a counter and as such SPARC is badly disadvantaged compared to other architectures.
Blank Xavier
+2  A: 

The x86 and Itanium have CMPXCHG (compare and exchange)

Darksider
Note to old hardware hackers, this instruction wasn't added until the i486.
Brian Knoblauch
that's a note to young hackers isn't it?
Peeter Joot
+4  A: 

Intel x86 has this support. IBM in it's Solaris to Linux Porting Guide gives this example:

bool_t My_CompareAndSwap(IN int *ptr, IN int old, IN int new)
{
        unsigned char ret;

        /* Note that sete sets a 'byte' not the word */
        __asm__ __volatile__ (
                "  lock\n"
                "  cmpxchgl %2,%1\n"
                "  sete %0\n"
                : "=q" (ret), "=m" (*ptr)
                : "r" (new), "m" (*ptr), "a" (old)
                : "memory");

        return ret;
}
mat_geek
This code is wrong. For example, it doesn't clobber cc.
Blank Xavier
+4  A: 

Powerpc has more powerful primitives available: "lwarx" and "stwcx"

lwarx loads a value from memory but remembers the location. Any other thread or cpu that touches that location will cause the "stwcx", a conditional store instruction, to fail.

So the lwarx /stwcx combo allows you to implement atomic increment / decrement, compare and swap, and more powerful atomic operations like "atomic increment circular buffer index"

--jeffk++

jdkoftinoff
x86, too, has atomic increment/decrement (`lock inc`/`lock dec`) and atomic exchange-and-add (`xadd`).
Anton Tykhyy
The nice thing with lwarx and stwcx is that lock inc/lock dec are not the only things you can implement with them. They give you a building block for software transaction memory (STM) with good scalability across multiple cores.
jdkoftinoff
+3  A: 

Starting with the ARMv6 architecture ARM has the LDREX/STREX instructions that can be used to implement an atomic compare-exchange operation.

Michael Burr
Is ARM's LDREX/STREX similar to PPC's LWARX/STWCX?
ceretullis
I believe so - the ARM Tech Ref manual's explanation of LDREX/STREX is rather complex (and for the PowerPC I'm going by Jeff Koftinoff's explanation) so there may well be some difference in the details.
Michael Burr
+2  A: 

Just to complete the list, MIPS has Load Linked (ll) and Store Conditional (sc) instructions which load a value from memory and later conditionally store if no other CPU has accessed the location. Its true that you can use these instructions to perform swap, increment, and other operations. However the disadvantage is that with a large number of CPUs exercising locks very heavily you get into livelock: the conditional store will frequently fail and necessitate another loop to try again, which will fail, etc.

The software mutex_lock implementation can become very complicated trying to implement an exponential backoff if these situations are considered important enough to worry about. In one system I worked on with 128 cores, they were.

DGentry
I agree, lock contention is something that has to be watched very carefully when using non-locking data-structures (which typically use CAS). Thanks for the note.
ceretullis
+1  A: 

A different and easier way to answer this question may be to list multiprocessor platforms that do NOT support a compare and swap (or a load-link/store-conditional that can be used to write one).

The only one I know of is PARISC, which only has an atomic clear word instruction. This can be used to construct a mutex (provided one aligns the word on a 16 byte boundary). There is no CAS on this archetecture (unlike x86, ia64, ppc, sparc, mips, s390, ...)

Peeter Joot
Great info Peeter.
ceretullis
+1  A: 

Compare and swap was added to IBM mainframes in 1973. It (and compare double and swap) are still on the IBM mainframes (along with more recent multi-processor functions like PLO - perform locked operation).

s.holton