The x64 architecture supports a 64-bit compare-exchange using the good, old cmpexch
instruction. Or you could also use the somewhat more complicated cmpexch8b
instruction (from the "AMD64 Architecture Programmer's Manual Volume 1: Application Programming"):
The CMPXCHG instruction compares a
value in the AL or rAX register with
the first (destination) operand, and
sets the arithmetic flags (ZF, OF, SF,
AF, CF, PF) according to the result.
If the compared values are equal, the
source operand is loaded into the
destination operand. If they are not
equal, the first operand is loaded
into the accumulator. CMPXCHG can be
used to try to intercept a semaphore,
i.e. test if its state is free, and if
so, load a new value into the
semaphore, making its state busy. The
test and load are performed
atomically, so that concurrent
processes or threads which use the
semaphore to access a shared object
will not conflict.
The CMPXCHG8B
instruction compares the 64-bit values
in the EDX:EAX registers with a 64-bit
memory location. If the values are
equal, the zero flag (ZF) is set, and
the ECX:EBX value is copied to the
memory location. Otherwise, the ZF
flag is cleared, and the memory value
is copied to EDX:EAX.
The CMPXCHG16B
instruction compares the 128-bit value
in the RDX:RAX and RCX:RBX registers
with a 128-bit memory location. If the
values are equal, the zero flag (ZF)
is set, and the RCX:RBX value is
copied to the memory location.
Otherwise, the ZF flag is cleared, and
the memory value is copied to rDX:rAX.
Different assembler syntaxes may need to have the length of the operations specified in the instruction mnemonic if the size of the operands can't be inferred. This may be the case for GCC's inline assembler - I don't know.