ansaurus

Question

Intel Thread Building Blocks Concurrent Queue: Using pop() over pop_if_present()

Answer 1

+2 A:

With the while(pop_if_present(...)) you are doing brute-force busy wait (also called spinning) on the queue. When the queue is empty you waste cycles by keeping CPU busy until either an item is pushed into the queue by another thread running on different CPU, or OS deciding to give your CPU to some other, possibly unrelated thread/process.

You can see how this could be bad if you have only one CPU - the producer thread would not be able to push and thus stop the consumer spinning until at least the end of consumer's time quanta plus overhead of a context switch. Clearly a mistake.

With multiple CPUs this might be better if the OS selects (or you enforce) the producer thread to run on different CPU. This is the basic idea of spin-lock - a synchronization primitive built directly on special processor instructions such as compare-and-swap or load-linked/store conditional and commonly used inside the operating system to communicate between interrupt handlers and rest of the kernel, and to build higher level constructs such as semaphores.

With blocking pop(), if queue is empty, you are entering sleep wait, i.e. asking the OS to put the consumer thread into non-schedulable state until an event - push onto the queue - occurs form another thread. The key here is that the processor is available for other (hopefully useful) work. The TBB implementation actually tries hard to avoid the sleep since it's expensive (entering the kernel, rescheduling, etc.) The goal is to optimize the normal case where the queue is not empty and the item can be retrieved quickly.

The choice is really simple though - always sleep-wait, i.e. do blocking pop(), unless you have to busy-wait (and that is in real-time systems, OS interrupt context, and some very specialized applications.)

Hope this helps a bit.

Nikolai N Fetissov 2010-02-20 06:54:05

Answer 2

+1 A:

Intel's TBB library is open source, so I took a look...

It looks like pop_if_present() essentially checks if the queue is empty and returns immediately if it is. If not, it attempts to get the element on the top of the queue (which might fail, since another thread may have come along and taken it). If it misses, it performs an "atomic_backoff" pause before checking again. The atomic_backoff will simply spin the first few times it's called (doubling its spin loop count each time), but after a certain number of pauses it'll just yield to the OS scheduler instead of spinning on the assumption that since it's been waiting a while, it might as well do it nicely.

For the plain pop() function, if there isn't anything in the queue will perform atomic_backoff waits until there is something in the queue that it gets.

Note that there are at least 2 interesting things (to me anyway) about this:

the pop() function performs spin waits (up to a point) for something to show up in the queue; it's not going to yield to the OS unless it has to wait for more than a little short moment. So as you might expect, there's not much reason to spin yourself calling pop_if_present() unless you have something else you're going to do between calls to pop_if_present()
when pop() does yield to the OS, it does so by simply giving up it's time slice. It doesn't block the thread on a synchronization object that can be signaled when an item is placed on the queue - it seems to go into a sleep/poll cycle to check the queue for something to pop. This surprised me a little.

Take this analysis with a grain of salt... The source I used for this analysis might be a bit old (it's actually from concurrent_queue_v2.h and .cpp) because the more recent concurrent_queue has a different API - there's no pop() or pop_if_present(), just a try_pop() function in the latest class concurrent_queue interface. The old interface has been moved (possibly changed somewhat) to the concurrent_bounded_queue class. It appears that the newer concurrent_queues can be configured when the library is built to use OS synchronization objects instead of busy waits and polling.

Michael Burr 2010-02-21 10:54:52

ansaurus

tags:

views:

answers:

Intel Thread Building Blocks Concurrent Queue: Using pop() over pop_if_present()

related questions