views:

344

answers:

5

I have been going over the practicality of some of the new parallel features in .Net 4.0.

Say I have code like so:

foreach (var item in myEnumerable)
    myDatabase.Insert(item.ConvertToDatabase());

Imagine myDatabase.Insert is performing some work to insert to a SQL database.

Theoretically you could write:

Parallel.ForEach(myEnumerable, item => myDatabase.Insert(item.ConvertToDatabase()));

And automatically you get code that takes advantage of multiple cores.

But what if myEnumerable can only be interacted with by a single thread? Will the Parallel class enumerate by a single thread and only dispatch the result to worker threads in the loop?

What if myDatabase can only be interacted with by a single thread? It would certainly not be better to make a database connection per iteration of the loop.

Finally, what if my "var item" happens to be a UserControl or something that must be interacted with on the UI thread?

What design pattern should I follow to solve these problems?

It's looking to me that switching over to Parallel/PLinq/etc is not exactly easy when you are dealing with real-world applications.

+2  A: 

As you have surmised, taking advantage of Parallel.For or Parallel.ForEach requires that you have the ability to compose your work into discrete units (embodied by your lambda statement that is passed to the Parallel.ForEach) that can be executed independently.

Robert Harvey
Do any real-world problems meet this criteria? In other words, will the average application even be able to use these parallel features?
Jonathan.Peppers
@Jonathan: Absolutely. Have a look at this presentation by Scott Hanselman, where he shows a vivid example of how this works. http://channel9.msdn.com/posts/matthijs/Lap-Around-NET-4-with-Scott-Hanselman/ The demonstration starts at 38 minutes, 55 seconds into the talk, and ends at 47:02.
Robert Harvey
Apparently their website has some trouble skipping to 38:55, I will have to watch the whole thing at home and get back to you. I am still skeptical that they are going to deliver a good example.
Jonathan.Peppers
Well, finally loaded. His example for Parallel.For/ForEach, to me, looks like something that would be better handled by a RDMS. Other classes like using the Task class and cancellation token seems totally legit. So my only complaint is really the usability of Parallel.For/ForEach. Which is entirely useful for implementing specific algorithms, etc., but not a run-of-the-mill real-world application.
Jonathan.Peppers
A: 

there is a great discussing in answers and comments here: http://stackoverflow.com/questions/2774170/parallel-for-different-results-for-simple-addition.

Answer is no: parallel extensions will not think for you. Multithread issues are still actual here. This is nice syntax sugar, but not a panacea.

Andrey
It's a little bit more than just syntactical sugar. For example, you can specify the degree of parallelism, and hook up a cancel routine that will gracefully unwind all of the threads.
Robert Harvey
+11  A: 

The IEnumerable<T> interface is inherently not thread safe. Parallel.ForEach will automatically handle this, and only parallelize the items coming out of your enumeration. (The sequence will always be traversed, one element at a time, in order - but the resulting objects get parallelized.)

If your classes (ie: the T) cannot be handled by multiple threads, then you should not try to parallelize this routine. Not every sequence is a candidate for parallelization - which is one reason why this isn't done automatically by the compiler ;)

If you're doing work which requires working with the UI thread, this is still potentially possible. However, you'll need to take the same care you would anytime you're dealing with user interface elements on background threads, and marshal the data back onto the UI thread. This can be simplified in many cases using the new TaskScheduler.FromCurrentSynchronizationContext API. I wrote about this scenario on my blog here.

Reed Copsey
Best answer so far, side question though: say my loop-body performs a long running IO operation (network request, database, etc.), will the Parallel class detect sleeping/suspended threads and automatically start a new one? Or will it be limited to the number of cores on the machine?
Jonathan.Peppers
@Jonathan.Peppers: The default task scheduler handles this pretty well. It will inject extra work into the situation. (By default, the ThreadPool uses many more elements than threads, and scales back based on workload dynamically)
Reed Copsey
+5  A: 

All of these are legitimate issues - and PLINQ/TPL don't attempt to address them. It's still your job as a developer to write code that can function correctly when parallelized. There's no magic that the compiler/TPL/PLINQ can do to convert code that is unsafe for multithreading into thread-safe code ... you have to make sure that you do so.

For some of the situations you described, you should first decide whether parallelization is even sensible. If the bottleneck will be acquiring connection to a database or ensuring correct sequencing of operations, then perhaps multithreading isn't appropriate.

In the case of how TPL streams an enumerable to multiple threads, your supposition is correct. The sequence is enumerated on a single thread and each work item is then (potentially) dispatched to a separate thread to be acted on. The IEnumerable<T> interface is inherently not threadsafe, but TPL handles this behind the scenes for you.

What PLINQ/TPL do help you do, is manage when and how to dispatch work to multiple threads. The TPL detects when there are multiple cores on a machine and automaticaly scales the number of threads used to process the data. If a machine only has a single CPU/Core, then TPL may choose not to parallelize the work. The benefit to you, the developer, is not having to write two different paths - one for parallel logic, one for sequential. However, the responsibility is still yours to make sure that your code can be safely accessed from multiple threads concurrently.

What design pattern should I follow to solve these problems?

There's no one answer to this question... however, a general practice is to employ immutability in your object design. Immutability makes it safer to consume an object across multiple threads and is one of the most common practices in making operations parllelizable. In fact, languages like F# make use of immutability extensively to allow the language to help make concurrent programming easier.

If you're on .NET 4.0, you should also look into the ConcurrentXXX collections classes in System.Collections.Concurrent. This is where you'll find some lock-free and fine-grained locking collection constructs that make writing multithreaded code easier.

LBushkin
A: 

This is a very good question and the answer is not 100% clear/concise. I would point you to this reference from Micrsoft, it lays out a good bit of detail as to WHEN you should use the parallel items.

Mitchel Sellers