views:

81

answers:

2

I've made a chain of 4 producer-consumer threads (forming a 4 step pipeline). To my amazement, all four threads are running in sequence, instead of concurrently!!! That is, the second thread mysteriously waits until the first thread is entirely done producing. The third thread mysteriously waits until the second thread is entirely done producing (and so on).

It gets worse. If I put a Thread.Sleep(300) into the loop of the first producer, then the other three threads become concurrent and actually get processor time, as expected, producing "random interleaved" console output as expected from a multi-threaded app. I'm almost unable to accept the idea that a "sleep" is a necessary part of the solution, and yet I see that a sleep is incorporated in exactly that fashion in code written by Jon Skeet.

Please tell me that is not necessary to achieve concurrency, or if it is, then why?

A more precise story about my particular producer-consumer chain looks like this:

  1. First thread: in a tight loop it produces "widget" messages as fast as possible, pushing them into a queue for the next thread. A System.Threading.Timer is set for ~100 milliseconds when the first widget is added to the queue. The callback fired from the timer is the second thread...
  2. Second thread (fired from timer): reads some or all of the widgets from the prior queue. It sends them into another queue (to be consumed by the third thread). The monitor.Pulse/Wait mechanism is used to synchronize with the third thread.
  3. Third thread: Blocks on a monitor.Wait until monitor.Pulse is called, then fetches one item from the queue. The one item is pushed into the final queue, again using monitor.Pulse when the push is done.
  4. fourth thread: Blocks on a monitor.Wait until monitor.Pulse is called. Widgets are processed.

To process 1 million widgets through this pipeline takes about 4 minutes. In 4 minutes, there is PLENTY of time for the last 3 threads to be scheduled and do some work concurrently with the first thread. But as I said, the last three threads run in sequence, unless I introduce a tiny sleep for the first thread. It makes no sense.

Any thoughts on why this works this way?

p.s. Please do not tell me that a long producer-consumer chain, as I've described, can be shrunk or eliminated. Please trust me (or assume) that I need a chain that long. :)

A: 

Let me guess - you're most likely using some kind of locking mechanism? I would propose to do lock-free implementations of the messaging queues to up your performance. Either that or you're forcing them all onto the same hardware thread, but I'm not sure if you can do that on windows.

I would suggest by start reading this lock-free implementation of a consumer-producer queue that works with 1 producer and 1 consumer: The multithreaded single-producer-single-consumer without locks: http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005

I'd also suggest reading a little bit about volatile: http://www.drdobbs.com/cpp/212701484

Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code: http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2

Finally I would suggest reading this article as well for another lock-free queue: http://www.drdobbs.com/architecture-and-design/210604448

I should also point out race-conditions and using cache sized memory blocks for space that you write into and separate that from the cache-lines that you're reading into to avoid that one thread forces another thread to re-fetch data from main memory.

Hope that helps

Simon
+1  A: 

Tight loops impact multithreading. It sounds like your first thread is running fast enough that the second thread doesn't even get the chance to start. Note that if this happens, then the sequential solution is the most efficient. :)

Since you have a somewhat complex producer/consumer layout, I'm assuming that you don't see this behavior with real-world data.

While you can work around this behavior by adding Thread.Sleep with a non-zero argument (see Joe Duffy's blog entry on why this works) or just ignore it, a better solution is to limit the size of the first producer/consumer queue. This will only allow the first thread to produce a certain number of widgets and then block until the rest of the pipeline has a chance to start.

The .NET 4.0 BlockingCollection<T> allows you to specify a maximum size. The Microsoft Rx library has backported this to .NET 3.5, so you could use this if you want. (I do recommend this instead of using an in-house solution).

Stephen Cleary
"It sounds like your first thread is running fast enough that the second thread doesn't even get the chance to start." The million dollar question is: why is that so? Why does the run-speed of one thread prevent another thread from running? That should never happen, ever!
Brent Arias
Remember that there is a task scheduler involved. A thread gets its timeslice; in some situations even more. How long does the first thread actually run?
Stephen Cleary
@Stephen: it runs for 1 million records (a couple minutes) with a 30ms quantum. The thread scheduler is supposed to round-robin schedule other threads with the same priority. Normally, only holding a contended lock should cause this behavior. But my first thread only has the lock about half the time. With 1 million records, assuredly **at least once** the thread would suspend outside of the lock, allowing the next thread to run.
Brent Arias
You're not messing with thread priorities, are you? All these threads are at the default priority, correct?
Stephen Cleary
@Stephen: yes, they are all at same priority (not touching priorities). I figured "perhaps the first thread, a foreground thread, has higher priority than second thread, a background threadpool thread." Based on that, I changed the first thread into a background thread (I still did not touch priorities explicitly). That adjustment made no change to the outcome.
Brent Arias