ansaurus

Question

Which CPU architectures support Compare And Swap (CAS)?

Answer 1

+2 A:

Sparc v9 has a cas instruction. The SPARC v9 architecture manual discusses the use of the CAS instruction in Annex J, look specifically at examples J.11 and J.12.

I believe the name of the instruction is actually "casa", because it can access either the current address space or an alternate. "cas" is an assembler macro which accesses the current ASI.

There is also an article on developers.sun.com discussing the various atomic instructions which Sparc processors have implemented over the years, including cas.

DGentry 2008-09-30 04:46:22

What is it? Can you give a link?

ceretullis 2008-09-30 04:51:47

I edited the answer to provide more details and links.

DGentry 2008-09-30 05:23:16

Note though that x86 has double word CAS and the other non-SPARC CPUs have ll/cs - both of which solve ABA with a counter. Single word CAS does not permit solving ABA with a counter and as such SPARC is badly disadvantaged compared to other architectures.

Blank Xavier 2009-11-04 12:30:40

Answer 2

+2 A:

The x86 and Itanium have CMPXCHG (compare and exchange)

Darksider 2008-09-30 04:49:46

Note to old hardware hackers, this instruction wasn't added until the i486.

Brian Knoblauch 2008-09-30 11:58:05

that's a note to young hackers isn't it?

Peeter Joot 2009-11-27 22:36:06

Answer 3

+4 A:

Intel x86 has this support. IBM in it's Solaris to Linux Porting Guide gives this example:

bool_t My_CompareAndSwap(IN int *ptr, IN int old, IN int new)
{
        unsigned char ret;

        /* Note that sete sets a 'byte' not the word */
        __asm__ __volatile__ (
                "  lock\n"
                "  cmpxchgl %2,%1\n"
                "  sete %0\n"
                : "=q" (ret), "=m" (*ptr)
                : "r" (new), "m" (*ptr), "a" (old)
                : "memory");

        return ret;
}

mat_geek 2008-09-30 04:51:58

This code is wrong. For example, it doesn't clobber cc.

Blank Xavier 2009-11-04 12:29:03

Answer 4

+4 A:

Powerpc has more powerful primitives available: "lwarx" and "stwcx"

lwarx loads a value from memory but remembers the location. Any other thread or cpu that touches that location will cause the "stwcx", a conditional store instruction, to fail.

So the lwarx /stwcx combo allows you to implement atomic increment / decrement, compare and swap, and more powerful atomic operations like "atomic increment circular buffer index"

--jeffk++

jdkoftinoff 2008-09-30 05:15:13

x86, too, has atomic increment/decrement (`lock inc`/`lock dec`) and atomic exchange-and-add (`xadd`).

Anton Tykhyy 2010-08-17 01:03:15

The nice thing with lwarx and stwcx is that lock inc/lock dec are not the only things you can implement with them. They give you a building block for software transaction memory (STM) with good scalability across multiple cores.

jdkoftinoff 2010-08-18 04:39:02

Answer 5

+3 A:

Starting with the ARMv6 architecture ARM has the LDREX/STREX instructions that can be used to implement an atomic compare-exchange operation.

Michael Burr 2008-09-30 05:48:14

Is ARM's LDREX/STREX similar to PPC's LWARX/STWCX?

ceretullis 2008-09-30 05:56:27

I believe so - the ARM Tech Ref manual's explanation of LDREX/STREX is rather complex (and for the PowerPC I'm going by Jeff Koftinoff's explanation) so there may well be some difference in the details.

Michael Burr 2008-09-30 06:31:07

Answer 6

+2 A:

Just to complete the list, MIPS has Load Linked (ll) and Store Conditional (sc) instructions which load a value from memory and later conditionally store if no other CPU has accessed the location. Its true that you can use these instructions to perform swap, increment, and other operations. However the disadvantage is that with a large number of CPUs exercising locks very heavily you get into livelock: the conditional store will frequently fail and necessitate another loop to try again, which will fail, etc.

The software mutex_lock implementation can become very complicated trying to implement an exponential backoff if these situations are considered important enough to worry about. In one system I worked on with 128 cores, they were.

DGentry 2008-10-01 04:00:41

I agree, lock contention is something that has to be watched very carefully when using non-locking data-structures (which typically use CAS). Thanks for the note.

ceretullis 2008-10-01 17:15:43

Answer 7

+1 A:

A different and easier way to answer this question may be to list multiprocessor platforms that do NOT support a compare and swap (or a load-link/store-conditional that can be used to write one).

The only one I know of is PARISC, which only has an atomic clear word instruction. This can be used to construct a mutex (provided one aligns the word on a 16 byte boundary). There is no CAS on this archetecture (unlike x86, ia64, ppc, sparc, mips, s390, ...)

Peeter Joot 2009-11-26 15:32:39

Great info Peeter.

ceretullis 2009-11-27 14:39:15

Answer 8

+1 A:

Compare and swap was added to IBM mainframes in 1973. It (and compare double and swap) are still on the IBM mainframes (along with more recent multi-processor functions like PLO - perform locked operation).

s.holton 2010-08-17 00:53:10

ansaurus

tags:

views:

answers:

Which CPU architectures support Compare And Swap (CAS)?

related questions