views:

41

answers:

1

I have a large data structure that is using striping to reduce lock contention. Right now I am using system locks but 99.99% of the time, the lock is uncontested and futhermore, the amount of time holding the lock is quite miniscule. However, several distinct memory operations are performed while the lock is held. It has actually gotten to the point where the time spent aquiring and releasing the locks is significant compared to the overall time accessing the data structure.

So I thinking about replacing the OS lock with the following very simple lock. Only try and unlock are shown here because the 99.99% of the time FastTryLock() is going to succeed. The "pLock" variable here represents a fine granularity lock in the striped structure.

I have written the following implementation which appears to work fine but I would appreciate confirmation if it is correct or incorrect.

bool FastTryLock(DWORD *pLock)
{
    if(0==AtomicXCHG(pLock,1)) {
        MemoryBarrier_LightWeight(); return(true);
    }
    return(false);
}
void FastUnlock(DWORD *pLock)
{
    MemoryBarrier_LightWeight(); *((volatile DWORD*)pLock)=0;
}

On the PC, MemoryBarrier_LightWeight() is a no-op since the CPU guarantees memory write ordering.

A: 

Yes, this is a technique called spin lock. Note, however, that casting pointer to volatile is not guaranteed to work according to standard. Just declare your lock variable as volatile instead.

doublep
Would a shadow pointer that was "volatile DWORD" work for sure ?
Adisak
Also, the difference between this and the "spin lock" you link to is that I only use the Interlocked op to acquire the lock. Memory fencing semantics should be enough to release it.
Adisak
I cannot parse through the legalistic-like nightmarish language of C++ standard, but in real-world implementations `volatile` variable will work exactly as you need. The optimization you described in the second comment is actually also mentioned in the linked article. Note that atomic types are part of C++0x standard, so you can use e.g. `std::atomic_flag` if your compiler supports that and avoid manual coding.
doublep
Actually, the second optimization doesn't seem to be there. It is doing and "XCHG" for the store in the unlock operation while I am just doing a "MOV".
Adisak
Second paragraph in the "Significant optimizations" section: "On later implementations of the x86 architecture, spin_unlock can safely use an unlocked MOV instead of the locked XCHG, which is much faster."
doublep