With the while(pop_if_present(...))
you are doing brute-force busy wait (also called spinning) on the queue. When the queue is empty you waste cycles by keeping CPU busy until either an item is pushed into the queue by another thread running on different CPU, or OS deciding to give your CPU to some other, possibly unrelated thread/process.
You can see how this could be bad if you have only one CPU - the producer thread would not be able to push and thus stop the consumer spinning until at least the end of consumer's time quanta plus overhead of a context switch. Clearly a mistake.
With multiple CPUs this might be better if the OS selects (or you enforce) the producer thread to run on different CPU. This is the basic idea of spin-lock - a synchronization primitive built directly on special processor instructions such as compare-and-swap or load-linked/store conditional and commonly used inside the operating system to communicate between interrupt handlers and rest of the kernel, and to build higher level constructs such as semaphores.
With blocking pop()
, if queue is empty, you are entering sleep wait, i.e. asking the OS to put the consumer thread into non-schedulable state until an event - push onto the queue - occurs form another thread. The key here is that the processor is available for other (hopefully useful) work. The TBB implementation actually tries hard to avoid the sleep since it's expensive (entering the kernel, rescheduling, etc.) The goal is to optimize the normal case where the queue is not empty and the item can be retrieved quickly.
The choice is really simple though - always sleep-wait, i.e. do blocking pop()
, unless you have to busy-wait (and that is in real-time systems, OS interrupt context, and some very specialized applications.)
Hope this helps a bit.