I have a large data structure that is using striping to reduce lock contention. Right now I am using system locks but 99.99% of the time, the lock is uncontested and futhermore, the amount of time holding the lock is quite miniscule. However, several distinct memory operations are performed while the lock is held. It has actually gotten to the point where the time spent aquiring and releasing the locks is significant compared to the overall time accessing the data structure.
So I thinking about replacing the OS lock with the following very simple lock. Only try and unlock are shown here because the 99.99% of the time FastTryLock() is going to succeed. The "pLock" variable here represents a fine granularity lock in the striped structure.
I have written the following implementation which appears to work fine but I would appreciate confirmation if it is correct or incorrect.
bool FastTryLock(DWORD *pLock)
{
if(0==AtomicXCHG(pLock,1)) {
MemoryBarrier_LightWeight(); return(true);
}
return(false);
}
void FastUnlock(DWORD *pLock)
{
MemoryBarrier_LightWeight(); *((volatile DWORD*)pLock)=0;
}
On the PC, MemoryBarrier_LightWeight() is a no-op since the CPU guarantees memory write ordering.