views:

256

answers:

4

.NET 4 includes new concurrent data structures. The Bag and Dictionary collections have obvious applications but I cannot see any use for the Queue and Stack data structures. What are people using these for?

Also, I've noticed that the design based upon linked lists incurs a lot of allocation and that destroys scalability. This is surprising given that the sole purpose of these collections is multicore programming. Is this an inherent limitation or are they just badly implemented?

+2  A: 

The fairly obvious scenario for a queue would be one (or more) threads putting work items in the queue, and several worker threads extracting them for concurrent processing.

I'd imagine that linked-list based design is to make it lock-free. What's wrong with the scalability of that, and which other options did you have in mind?

Pavel Minaev
@Pavel: Is the `ConcurrentBag` not better suited to your scenario than the `ConcurrentQueue`?
Jon Harrop
@Jon: It depends - if you want to roughly preserve the order for processing, CQ makes more sense than CB.
Reed Copsey
@Reed: And in what practical scenarios might you want to "roughly preserve the order"?
Jon Harrop
@Jon @Reed F# MailboxProcessor would be one example
Mitya
@Jon: Mitya is correct. I've needed to do this for large processing of streamed data, where preserving the order wasn't a requirement, but the "overall priority" of requests that needed processing was roughly based on when a request came in. This is very common, IMO.
Reed Copsey
@Mitya: Would a ConcurrentBag not be at least as good in every respect and better for performance because adding elements is more scalable? If you don't care about order then choosing a queue would simply introduce unnecessary contention.
Jon Harrop
@Jon: the contract on MailboxProcessor is that if you post synchronously message A and then message B, they will be processed in that order.
Mitya
@Mitya: And ConcurrentBag satisfies that ordering because it is a per-producer queue, right?
Jon Harrop
@Jon: my understanding is that it is not part of ConcurrentBag contract (although _I think_ that implementation works that way).
Mitya
+7  A: 

Stacks and Queues are incredibly useful in concurrent programming, just as in sequential programming.

The new ConcurrentQueue<T> and ConcurrentStack<T> classes provide a very nice, thread-safe implementation of a Queue and a Stack. These are particularly useful when dealing with multi-threaded producer/consumer scenarios, as both classes are lockless (good for scalability) and threadsafe, as well as fairly performant.

Also, I'd like to point out one thing - you have two misconceptions in your second paragraph. Linked lists are not particularly bad for scalability. Memory allocation ~may~ need to occur regularly (though there are ways to combat this), but often, that is a smaller price to pay than other potential issues in terms of scalability. (This really depends on the scenario...) Also, the new ConcurrentQueue<T> and ConcurrentStack<T> classes are not based on a (at least traditional) linked list. They're a lockless class which internally uses a linked list of arrays to hold the elements, more like std::deque.

Reed Copsey
Hmm, CQ is obvious. But LIFO in concurrent programming requires a Real Life example. Do you have one?
Hans Passant
@Hans: I agree that LIFO concurrent is pretty odd. I've personally never needed it (though I use CQ a lot). A single producer thread, with a single consumer thread, could use a CS though, just like a regular stack, for a prod/consumer scenario, and avoid the locking via Push()/TryPop() that would be required if a System.Collections.Generic.Stack<T> was used.
Reed Copsey
Btw - why the downvotes? Really curious here...
Reed Copsey
It wasn't me, but I reckon that "incredibly useful" for stacks could draw flak.
Hans Passant
@Hans and downvoters: Basically, ConcurrentStack can be useful (in conjunction with BlockingCollection) if you want to have your stack processing happen in a separate thread from the producing thread. This gives you a single LIFO stack processor that doesn't block the producing thread while doing work. Also, ConcurrentStack can be used to preserve processing order ~for specific parts~ of the stack via PushRange(), which potentially makes the usage more obvious at times.
Reed Copsey
-1: I downvoted because I felt this did not answer the question and because it is technically inaccurate.
Jon Harrop
@Jon: when claiming that something is "technically inaccurate", it is worth detailing what _precisely_ do you feel to be technically inaccurate, and correct those inaccuracies, with references for potentially contentious claims.
Pavel Minaev
@Jon: You claim this is technically inaccurate. Which part, exactly? I have tested these data structures fairly extensively - to my knowledge, no statement here is inaccurate.
Reed Copsey
Jon Harrop
@Jon: ThreadLocal data vs. shared data is not at all a fair comparison. The "fair" comparison would be adding and removing from a single List<T> or other collection, and introducing the synchronization required to make it behave appropriately. I've NEVER said that a ConcurrentBag should be used in all situations, but thread local storage is not always an option - if you need a collection with shared state, you'll find it difficult to beat out the CB/CS/CQ implementations.
Reed Copsey
@Reed: No, that is a perfectly fair comparison because the data are not transferred between cores at all in this benchmark and the ConcurrentBag is supposed to be specifically optimized for that case, yet it does not scale.
Jon Harrop
A: 

Addressing the second part of your question, you are quite right that although concurrent collections implementations in .Net 4.0 are trying to be lock-free, they still sit on top of non-lock-free memory allocation subsystem.

Memory management is a bane of all lock-free data structures: here is a nice presentation on the state of the art: http://sysrun.haifa.il.ibm.com/hrl/ISMM2009/program.html#7

The bottom line is that this area is very much research-in-progress, so probably not yet quite ready to be included in wide-deployment production platforms such as .Net.

Mitya
This is insane. Your decent answer gets downvoted when the total bullshit answer has eight upvotes. WTF?!
Jon Harrop
+1  A: 

Here a recent blog post which covers precisely the question you are after (GC issues with using ConcurrentBag and TPL), suggests ways to discover and analyze this in the field (VS2010 Concurrency Visualizer). The use of Server GC is suggested for partial solution.

http://blogs.msdn.com/b/hshafi/archive/2010/06/17/case-study-parallelism-and-memory-usage-vs2010-tools-to-the-rescue.aspx?wa=wsignin1.0

Mitya
Another great answer, thanks Mitya!
Jon Harrop