views:

518

answers:

2

Suppose I'm writing (assembly) code for a CPU whose only atomic operation is an unconditional swap -- no LL/SC, no compare-and-swap, just plain swap. (An ARM9 would be an example of such a beast.) Is there a way to perform atomic increment/decrement operations using the swap operation?

There is a relatively easy answer, which is to use the swap to build a spinlock, and use that to provide a critical section around ordinary increment and decrement. But that seems clunky, and I know it can be done without a lock if CAS or LL/SC are available. So what I'm really wondering is if there's a way to do it without using a lock.

+3  A: 

I can't think of another way to do it, simply because you need to both swap and compare to detect if you're allowed to proceed. If you don't have a compare-and-swap command, you'll have to implement it with a looping swap and compare, something like:

; Emulate atomic add/sub with atomic swap.
; On entry:
;   r0 contains address of variable
;   r1 contains value to add or subtract.

mutex:    defw    0           ; mutual exclusion semaphore (0 means free, 1 means busy).

chng:     push    r2          ; save variables.
          ld      r2,1        ; claiming value.
spin:     swap    r2,(mutex)  ; atomic swap (sounds like a good name for a band).
          bnz     spin        ; loop until you have it.
          add     (r0),r1     ; emulated atomic change.
          swap    r2,(mutex)  ; free mutex for everyone else.
          pop     r2          ; restore registers.
          ret

It's only really klunky if you're doing it in a lot of places in your code. I've often found that isolating 'klunky' code to a function (like above) makes it far less klunky since you then end up with lots of code segments looking like the much simpler:

myvar:    defw    0
          : : : : :
          ld      r0,myvar
          ld      r1,1        ; increment
          call    chng

or, if you want your code even simpler, provide separate incr and decr functions:

; Emulate atomic incr/decr with emulated atomic change.
; On entry:
;   r0 contains address of variable

incr:     push    r1          ; save registers.
          ld      r1,1        ; increment.
          call    chng        ; do it.
          pop     r1          ; restore registers.
          ret
decr:     push    r1          ; save registers.
          ld      r1,-1       ; decrement.
          call    chng        ; do it.
          pop     r1          ; restore registers.
          ret

Then your code sequences become:

          ld      r0,myvar
          call    incr

or, if you can do macros, an even simpler:

atincr:   defm                ; do this once to define macro
          ld      r0,&1
          call    incr
          endm

          atincr  myvar       ; do this in your code, as much as you like.
paxdiablo
Pretty much what I was thinking, but well said. There's additional clunkiness, though, if you want to do this on a single-core machine: in that case a thread that can't acquire the spin lock will chew up cycles until the OS decides to timeslice (or change the code to yield in that case). With a compare-and-swap version, the running thread can always complete the increment, even if another thread was interrupted in the middle of an increment to the same location.
rcbilson
That is a good point since the time of contention for CAS is writing, not reading as in this case, but moot for this question since no CAS is available :-) You would have to yield if possible for this solution.
paxdiablo
The only problem with this approach is that you are using a single mutex everywhere. If you're on SMP, and if more threads can modify different vars, then you'll probably want to have a lock per var. Or to choose among a few mutexes according to the address of the var (hashed spinlocks). In that case, make sure that different spinlocks are on different cachelines. If you're insulating against device access, this is unneeded.Are there ARM SMP machines, somewhere?
Blaisorblade
A: 

If your CPU is single core than you can use this way, of ocurse without swaping, but you must be carefull. Here is s simpe case:

//incrementing of variable
cli        //disable interrupts if we aren't on high priviliegied code execution level
//perform non atomic increment of variable
sti        //enable interrupts if we aren't on high priviliegied code execution level

//reading variable
cli        //disable interrupts if we aren't on high priviliegied code execution level
//perform non atomic read operation
sti        //enable interrupts if we aren't on high priviliegied code execution level

This method works like a lock but is much more smart and fast. The only weakness of this method is possible CPU interrupt delay, but if is your code inside disabled interrupts short and fast than this delay normaly isn't critical.

GJ
Yes, for a simple operation this is a good approach, but the hitch is that in order to disable interrupts the code must be running in a privileged processor mode. That's a significant restriction, and the specifics of my applcation rule it out entirely.
rcbilson