tags:

views:

986

answers:

5

I've been messing around with some x86 assembly as its come up in a number of my classes. In particular, I've wanted to expose compare-and-swap (CAS) as a user function. This is with the intent that I can implement my own locks.

I'm using Linux 2.6.31 with GCC 4.1.1 on an Intel CPU.

I have the following:

// int cmpxchg(int *dest, int expected, int update)
.globl cmpxchg
cmpxchg:
  pushl %ebp
  movl  %esp, %ebp

  // edx holds dest
  movl 8(%ebp), %edx
  // eax holds expected value
  movl 12(%ebp), %eax
  // ecx holds the new value
  movl 16(%ebp), %ecx

  // cmpxchg dest_addr, exp_value
  // compare to %eax is implicit
  lock cmpxchgl %edx, %ecx

  leave
  ret

This is within a *.s file, which I compile with my driver program. When I include the line

  lock cmpxchgl %edx, %ecx

and execute, I receive an "Illegal instruction" error. When I replace the line with

  cmpxchgl %edx, %ecx

my code seems to run fine.

First off, is lock necessary? I'm not sure whether cmpxchgl is naturally atomic, so I used lock to be sure. As a userland program, am I even allowed to use lock?

Thanks

================================================================

My final code (for those who may wander here in the future):

// int cmpxchg(int *dest, int expected, int update)
.globl cmpxchg
cmpxchg:
  pushl %ebp
  movl  %esp, %ebp

  // edx holds dest, use eDx for Destination ;-)
  movl 8(%ebp), %edx
  // eax holds expected value implicitly
  movl 12(%ebp), %eax

  // cmpxchg dest_add, src_value
  lock cmpxchgl %edx, 16(%ebp)

  leave
  ret
+1  A: 

Your program compiles fine here (GNU as 2.20) (I pasted it into test.s and ran as -o test.o test.s)

As for the lock, intel's documentation says:

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor's bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

Gonzalo
A: 

This seems unlikely to be the source of the problem, but the official documentation also states that the instruction is not supported earlier than the 486 architecture. Is this occurring in real mode, with protected mode overrides?

wallyk
Unfortunately I'm not yet familiar with all the terminology you just used :-/Im running on a Q6600 quad-core Intel, so I'm sure the processor is new enough. What do you mean by real mode and protected mode overrides? A link to explanation is fine.
Willi Ballenthin
Yes, the CPU is plenty modern.Real mode is the boot up state of an x86 CPU. It is pretty much acting like a 1980s era 8086 in that mode. 32-bit and 64-bit operating system enable "protected mode" during initialization which provides modern features like an address space larger than 64K, virtual memory, etc.
wallyk
+5  A: 

You need cmpxchgl %edx, (%ecx)

This operation doesn't make sense unless the destination is a memory operand, however the instruction allows a register destination. The CPU will fault if the instruction uses a register mode.

I tried it, your code works with a memory operand. I don't know if you realize this, but this sequence (with a register destination) has a popular name: "the f00fc7c8 bug" or "the F00F bug". In the Pentium days this was an "HCF" (halt and catch fire) or "killer poke" instruction, as it would generate an exception which it would not then be able to service because the bus was locked, and it was callable from user mode. I think there may have been an OS-level software workaround.

DigitalRoss
Hm, right now an address (via a pointer) is loaded to %edx, and a value (integer) is loaded to %ecx. Is this what you mean?Forgive my ignorance, I'm only 6 hours into assembly...
Willi Ballenthin
You have to put an address in %ecx and use a memory addressing mode for the destination, such as `(%ecx)` or `offset(%ecx)`.
DigitalRoss
@Ross: The f00f bug has to do with the `CMPXCHG8B` instruction, which is not the same as the `CMPXCHG` instruction the OP is using.
bcat
Oh, right. Well, he got close, his opcode *does* start with `F00F`.
DigitalRoss
True. I had to look it up to double-check. :)
bcat
+2  A: 

Ross's answer already says most of this, but I'll try and clarify a couple of things.

  1. Yes, a LOCK prefix is necessary if you want atomicity. The only exception to this is the XCHG (not CMPXCHG) instruction, which is locked by default, as asveikau pointed out.
  2. Yes, it's perfectly legal to use LOCK from user-mode code.
  3. Yes, it's perfectly legal to use CMPXCHG with a register destination operand.

That said, it's not legal to use a LOCK CMPXCHG together with a register destination operand. Quoting volume 2A of the IA-32 manual (page 3-538 in my copy):

The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG.

bcat
interesting tidbit: the one atomic primitive you don't need LOCK for is XCHG (atomic swap). I think this is because that instruction predates the LOCK prefix.
asveikau
Oh yeah, that's a good point. I'll edit my answer to clarify.
bcat
The XCHG instruction does not predate the LOCK prefix; both were available on the original 8086.
I. J. Kennedy
A: 

Curious, is this final code still correct? From what I can see, you're doing the comparison in reverse, that is you are comparing the value of the pointer (i.e., the actual address that the pointer is referring to) with the integer being used as the update ... furthermore the destination is set as the temporary int being used as the update value. In other words rather than:

lock cmpxchgl %edx, 16(%ebp)

I would think you would want something like:

//move the update value into ecx register
movl 0x16(%ebp), %ecx

//do the comparison between the value at the address pointed to by edx and eax,
//and if they are the same, copy ecx into the address being pointed to by edx
lock cmpxchgl %ecx, (%edx)

Did the original code actually work as planned (not just compile), and if not, did you end up re-organizing the code so it looks more like the above?

Jason R