views:

551

answers:

2

I came across this interesting paragraph in the Boost thread documentation today:

void wait(boost::unique_lock<boost::mutex>& lock)

...

Effects: Atomically call lock.unlock() and blocks the current thread. The thread will unblock when notified by a call to this->notify_one() or this->notify_all(), or spuriously. When the thread is unblocked (for whatever reason), the lock is reacquired by invoking lock.lock() before the call to wait returns. The lock is also reacquired by invoking lock.lock() if the function exits with an exception.

So what I am interested in is the meaning of the word "spuriously". Why would the thread be unblocked for spurious reasons? What can be done to resolve this?

+1  A: 

This blog post gives a reason for Linux, in terms of the futex system call returning when a signal is delivered to a process. Unfortunately it doesn't explain anything else (and indeed is asking for more information).

The Wikipedia entry on spurious wakeups (which appear to be a posix-wide concept, btw, not limited to boost) may interest you too.

Jon Skeet
Hmm yeah that isn't really a satisfying answer given that it only really applies to one platform, although I guess if it really is incredibly difficult to get it to work "correctly" on Linux, then that could be a valid reason to document spurious wakeups
1800 INFORMATION
+3  A: 

This article by Anthony Williams is particularly detailed.

Spurious wakes cannot be predicted: they are essentially random from the user's point of view. However, they commonly occur when the thread library cannot reliably ensure that a waiting thread will not miss a notification. Since a missed notification would render the condition variable useless, the thread library wakes the thread from its wait rather than take the risk.

He also points out that you shouldn't use the timed_wait overloads that take a duration, and you should generally use the versions that take a predicate

That's the beginner's bug, and one that's easily overcome with a simple rule: always check your predicate in a loop when waiting with a condition variable. The more insidious bug comes from timed_wait().

This article by Vladimir Prus is also interesting.

But why do we need the while loop, can't we write:

if (!something_happened)
  c.wait(m);

We can't. And the killer reason is that 'wait' can return without any 'notify' call. That's called spurious wakeup and is explicitly allowed by POSIX. Essentially, return from 'wait' only indicates that the shared data might have changed, so that data must be evaluated again.

Okay, so why this is not fixed yet? The first reason is that nobody wants to fix it. Wrapping call to 'wait' in a loop is very desired for several other reasons. But those reasons require explanation, while spurious wakeup is a hammer that can be applied to any first year student without fail.

1800 INFORMATION
Looks like we've been finding the same pages, which isn't entirely unexpected :)
Jon Skeet
yeah this is kind of a summary of my research today when I figured out the cause of the horrible bug that I was suffering from. The boost documentation could be clearer on this point
1800 INFORMATION
I've found boosts entire system for waiting and notifying—especially in boost::interprocess—is very frustrating. I would only use it if you need it to be cross-platform and done now.
jeffamaphone