The main problems with the original version in the question is that it needs to use register indirect addressing and take a reference (or pointer parameter) rather than a by-value parameter for the lock DWORD.
Here's a working solution for Visual C++. EDIT: I have worked offline with the author and we have verified the code in this answer works in his test harness correctly.
But if you're using Windows, you should really by using the Interlocked API (i.e. InterlockedExchange).
Edit: As noted by CAF, lock xchg
is not required because xchg
automatically asserts a BusLock.
I also added a faster version that does a non-locking read before attempting to do the xchg
. This significantly reduces BusLock contention on the memory interface. The algorithm can be sped up quite a bit more (in a contentious multithreaded case) by doing backoffs (yield then sleep) for locks held a long time. For the single-threaded-CPU case, using a OS lock that sleeps immediately on held-locks will be fastest.
class LockImpl
{
// This is a simple SpinLock
// 0 - in use / busy
// 1 - free / available
public:
static void lockResource(volatile DWORD &resourceLock )
{
__asm
{
mov ebx, resourceLock
InUseLoop:
mov eax, 0 ;0=In Use
xchg eax, [ebx]
cmp eax, 0
je InUseLoop
}
}
static void lockResource_FasterVersion(DWORD &resourceLock )
{
__asm
{
mov ebx, resourceLock
InUseLoop:
mov eax, [ebx] ;// Read without BusLock
cmp eax, 0
je InUseLoop ;// Retry Read if Busy
mov eax, 0
xchg eax, [ebx] ;// XCHG with BusLock
cmp eax, 0
je InUseLoop ;// Retry if Busy
}
}
static void unLockResource(volatile DWORD &resourceLock)
{
__asm
{
mov ebx, resourceLock
mov [ebx], 1
}
}
};
// A little testing code here
volatile DWORD aaa=1;
void test()
{
LockImpl::lockResource(aaa);
LockImpl::unLockResource(aaa);
}