views:

728

answers:

5

I read spin lock code from here, especially this part

inline void Enter(void)
{
    int prev_s;
    do
    {
        prev_s = TestAndSet(&m_s, 0);
        if (m_s == 0 && prev_s == 1)
        {
            break;
        }
        // reluinquish current timeslice (can only
        // be used when OS available and
        // we do NOT want to 'spin')
        // HWSleep(0);
    }
    while (true);
}

Why do we need to test two conditions m_s == 0 && prev_s == 1? I think just test prev_s == 1 should be enough. Any ideas?

EDIT: version 2. should we fix in this way if there is a bug?

inline void Enter(void)
{
    do
    {
        if (m_s == 0 && 1 == TestAndSet(&m_s, 0))
        {
            break;
        }
        // reluinquish current timeslice (can only
        // be used when OS available and
        // we do NOT want to 'spin')
        // HWSleep(0);
    }
    while (true);
}

EDIT: version 3. I think version 3 from functional level is correct, but performance is not good enough since each time we need to write, no read test ahead. Is my understanding correct?

inline void Enter(void)
{
    do
    {
        if (1 == TestAndSet(&m_s, 0))
        {
            break;
        }
        // reluinquish current timeslice (can only
        // be used when OS available and
        // we do NOT want to 'spin')
        // HWSleep(0);
    }
    while (true);
}

@dragonfly, here is my bug fix version 4 (fixed a bug in version 2 as you pointed out), could you review whether it is correct please? Thanks!

EDIT: version 4.

inline void Enter(void)
{
    do
    {
        if (m_s == 1 && 1 == TestAndSet(&m_s, 0))
        {
            break;
        }
        // reluinquish current timeslice (can only
        // be used when OS available and
        // we do NOT want to 'spin')
        // HWSleep(0);
    }
    while (true);
}
+4  A: 

It looks to me like this is an attempt at optimisation gone slightly wrong. I suspect it's trying for "TATAS" - "Test-and-test-and-set", where it doesn't even try to do the TestAndSet if it can see the lock is already taken.

In a post about spin locks for .NET, Joe Duffy writes this TATAS code as:

class SpinLock {
    private volatile int m_taken;

    public void Enter() {
        while (true) {
            if (m_taken == 0 &&
                    Interlocked.Exchange(ref m_taken, 1) == 0)
                break;
        }
    }

    public void Exit() {
        m_taken = 0;
    }
}

(Note that Joe is using 1 to mean locked and 0 to mean unlocked, unlike the code project sample - either is fine, just don't get confused between the two!)

Note that here the call to Interlocked.Exchange is conditional on m_taken being 0. This reduces contention - the relatively expensive (I guess) test-and-set operation is avoided there it's unnecessary. I suspect that's what the author was aiming at, but didn't quite get it right.

This is also mentioned in the Wikipedia article about spinlocks under "significant optimisations":

To reduce inter-CPU bus traffic, when the lock is not acquired, the code should loop reading without trying to write anything, until it reads a changed value. Because of MESI caching protocols, this causes the cache line for the lock to become "Shared"; then there is remarkably no bus traffic while a CPU is waiting for the lock. This optimization is effective on all CPU architectures that have a cache per CPU, because MESI is so ubiquitous.

That "loop reading" is exactly what the while loop does - until it sees m_taken change, it only reads. When it sees the change (i.e. when the lock is released) it has another go at locking.

Of course it's very possible that I'm missing something important - issues like this are very subtle.

Jon Skeet
Hi Jon, I have a fix (editted in my original post), could you review whether my fix is correct please?
George2
Hi Jon, I did more learn from your comments and by myself, and I have editted in my original post version 3 of code. Could you review whether both of my code of version 3 and my understanding of version 3 are correct please? Thanks!
George2
See my comments on the question.
Jon Skeet
+1  A: 

Actually someone may call TestAndSet(&m_s, 1) i.e. Leave() from another thread just after TestAndSet(&m_s, 0) and before if test in Enter(). That way lock will not be obtained and m_s will not be equal to 0. Therefore such check is needed.

dragonfly
If another thread is exiting the lock without having successfully entered it to start with, the lock is being misused anyway, surely - nothing guards against that.
Jon Skeet
Hi Jon, I think the situation which dragonfly mentioned above should never happen, correct?
George2
The spinlock protocol requires that it be released exactly once, and only when held. If followed, the issue described here cannot happen, and guarding against it in the Enter() function is at best a waste of resources.
RBerteig
@RBerteig, "guarding against it in the Enter() function is at best a waste of resources" -- which line of code do you think is a waste? I think every line of code is necessary. :-)
George2
+2  A: 

Why both conditions? Because a second thread will also get the lock in that case. (Edit: but that case can't happen if all threads follow the spinlock protocol.)

If the lock is available (signaled) m_s has the value 1. When taken by some thread, it has the value 0. No other values are allowed.

Consider one thread that wants the lock, whether or not it is signaled at the moment that thread called Enter() is unimportant. It is allowed to take the lock if m_s is 1, and it takes by changing it to 0. The first iteration where that happens causes the loop to exit and the thread to have the lock.

Now consider two threads that want the same lock. Both are calling TestAndSet(), waiting to see the value 1 become 0. Because the function TestAndSet() is atomic, only one of the waiting threads ever gets to see the value 1. All the other threads only ever see m_s as 0, and must continue to wait.

The condition where m_s is 1 after setting it to 0 in this thread implies that some other thread signaled between the atomic operation and the condition. Since only one thread at a time is supposed to have the lock, it seems like it shouldn't happen that can't happen.

I'm guessing that this is an attempt to satisfy the invariant promise of the spinlock. (Edit: I'm no longer so sure, more below...) If it is held, the value of m_s must be zero. If not, it is one. If setting it to zero didn't "stick", then something weird is happening, and it is best not to assume it is now held by this thread when the invariant is not true.

Edit: Jon Skeet points out that this case might be a flaw in the original implementation. I suspect he's right.

The race condition that is being guarded is against a thread that doesn't have the right to signal the spinlock, signaling the spinlock anyway. If you can't trust callers to follow the rules, spinlocks probably aren't the synchronization method of choice, after all.

Edit 2: The proposed revision looks much better. It would clearly avoid the multi-core cache coherency interaction that the original had due to always writing the sentinel m_s.

After reading about the TATAS protocol (you can learn something new every day, if you pay attention...) and the multicore cache coherency issue it is addressing, it is clear to me that the original code was trying to do something similar, but without understanding the subtlety behind it. It would indeed have been safe (assuming all callers follow the rules) to drop the redundant check on m_s as it was written. But the code would have been writing to m_s on every loop iteration, and that would have played havoc in a real multicore chip with caching per core.

The new spinlock is still vulnerable to a second thread releasing it without holding it. There is no way to repair that. My earlier claim about trusting the callers to follow protocol still applies.

RBerteig
I think guarding against "something weird happening" demands a fuller explanation from the author of the article. I think it's more likely (given typos, and a correction to the later example) that the code is just flawed.
Jon Skeet
You might be right that this implementation is flawed. I try not to roll my own synchronization primitives, myself.
RBerteig
I completely agree about not rolling your own primitives. Let's leave that to experts :)
Jon Skeet
Hi RBerteig, what is your final points for this issue? There is a bvug? If so, I have a fix (editted in my original post), could you review whether my fix is correct please?
George2
Yes, it was a bug, but probably benign.
RBerteig
What do you mean "benign"? A typo?
George2
@RBerteig, I did more learn from your comments and by myself, and I have editted in my original post version 3 of code. Could you review whether both of my code of version 3 and my understanding of version 3 are correct please? Thanks!
George2
@RBerteig, "It would clearly avoid the multi-core cache coherency interaction that the original had due to always writing the sentinel m_s" -- could you describe more about this please? I am confused about what do you mean "always writing the sentinel m_s" and "cache coherency". Thanks!
George2
@George2, V3 is correct, but as you noted, suffers from writing on every loop. V2 does have a typo, and V4 fixes it. My comment about learning something new was about me... I'd never run into that cache issue before.
RBerteig
@George2, I wrote that the bug in V1 is benign (i.e. not dangerous) because Enter() behaves correctly in spite of it.
RBerteig
+1  A: 

All your versions are correct except 2. Also, your remarks about the check for m_s==0 in version 1 and the reduced performance in version 3 are correct.

The cause of the reduced is the way T&S is implemented, particularly that it causes a write on every call. This is because a write (even though it doesn't actually change the data of m_s), or intention to write, cause its cache line to be invalidated on other CPUs, which means that when another CPU (also waiting for the same lock) tests m_s, it cannot read the data from its cache, but has to get the data from the CPU that previously owned it. This phenomenon is called cache ping-pong, after the resemblance with the ping-pong game, where the ball (in this case the cache-line's data) constantly moves between the players (the CPU). If you add the extra test before T&S, the CPUs will all just read, which means they can all have the data in their caches (ie. share), until somebody writes to it.

This happens in versions 1 and 3, because they run T&S on every iteration of the loop.

Note that the remarks about the extra check protecting against other thread incorrectly freeing it are misleading, not because the other thread must not do it, but because such a check doesn't guard against this possibility, even remotely. Imagine that the other thread does it, but just after the locking thread does the check. If you really want to protect yourself against this kind of breakage, you should add another variable holding eg. the thread ID holding the lock, and checking the correct manipulation with the lock with it.

Another issue is that on some architectures, you need memory barriers to ensure good memory ordering, particularly ensuring that the m_s test actually reads the value every time (some compiler barrier should be sufficient here) and that any read (and maybe write if you wish) that occurs inside the critical section doesn't "leak out", that is, isn't performed by the CPU before the actual lock is taken. Unlock has to be handled similarly. Note that Jon Skeet's version is correct in this respect, because he uses Java (or C#? I'm not sure, but their semantics should be similar) volatile.

jpalecek
A: 

None of these examples are neccessarily correct on hardware that has less strict ordering. PowerPC and IA64 are two such examples, and isync and .acq operations are required on the test and set operations that get the lock (similarily lwsync and .rel operations in the exit).

Peeter Joot