views:

92

answers:

3

Consider the followign simple code pattern:

foreach(Item item in itemList)
{
   if(item.Foo)
   {
      DoStuff(item);
   }
}

If I want to parallelize it using Parallel Extensions(PE) I might simply replace the for loop construct as follows:

Parallel.ForEach(itemList, delegate(Item item)
{
   if(item.Foo)
   {
      DoStuff(item);
   }
});

However, PE will perform unnecessary work assigning work to threads for those items where Foo turned out to be false. Thus I was thinking an intermediate wrapper/filtering IEnumerable might be a reasonable approach here. Do you agree? If so what is the simplest way of achieving this? (BTW I'm currently using C#2, so I'd be grateful for at least one example that doesn't use lambda expressions etc.

+1  A: 

If I was to implement this, I would simply filter the list, before calling the foreach.

var actionableResults = from x in ItemList WHERE x.Foo select x;

This will filter the list to get the items that can be acted upon.

NOTE: this might be a pre-mature optimization, and could not make a major difference in your performance.

Mitchel Sellers
locster
If you're willing to use a pre-release port of PE in C# 2, you should have no objection to using LINQBridge to add LINQ-to-Objects support as well... http://goo.gl/QuEC
Joel Mueller
+1  A: 

That all factors down to this single line:

Parallel.ForEach(itemList.Where(i => i.Foo), DoStuff);

But reading a comment to another post I now see you're in .Net 2.0 yet, so some of this may be a bit tricky to sneak past the compiler.

For .Net 2.0, I think you can do it like this (I'm a little unclear that passing the method names as delegates will still just work, but I think it will):

public IEnumerable<T> Where(IEnumerable<T> source, Predicate<T> predicate)
{
   foreach(T item in source)
       if (predicate(item))
          yield return item;
}

public bool HasFoo(Item item) { return item.Foo; }

Parallel.ForEach(Where(itemList, HasFoo), DoStuff);
Joel Coehoorn
I'd replace `i => DoStuff(i)` with `DoStuff`.
Mehrdad Afshari
good point. I'm used to needing to supply the whole expression syntax, it's easy to forget I can get shorter for a single method call. done.
Joel Coehoorn
This makes me think that the best plan is to leave the code as it is for now, tag it with a TODO and re-visit it when I upgrade my project to C#3. Otherwise I'm going to get messy code for the sake of little gain.
locster
This won't work (as written) in .NET 2 with C# 2 - you need C# 3 for extension methods (it'd run on .NET 2, but not compile in VS 2005)...
Reed Copsey
Yeah, forgot that. Fixed now.
Joel Coehoorn
+4  A: 

I'm not sure how the partitioning in PE for .NET 2 works, so it's difficult to say there. If each element is being pushed into a separate work item (which would be a fairly poor partitioning strategy), then filtering in advance would make quite a bit of sense.

If, however, item.Foo happened to be at all expensive (I wouldn't expect this, given that it's a property, but it's always possible), allowing it to be parallelized could be advantageous.

In addition, in .NET 4, the partitioning strategy used by the TPL will handle this fairly well. It was specifically designed to handle situations with varying levels of work. It does partitioning in "chunks", so one item does not get sent to one thread, but rather a thread gets assigned a set of items, which it processes in bulk. Depending on the frequency of item.Foo being false, paralellizing (using TPL) would quite possibly be faster than filtering in advance.

Reed Copsey
+1. Insightful.
Mehrdad Afshari
I hadn't thought about it from that perspective. Thanks. Good answer.
locster
Just in case you're interested in the .NET 4 implementation, I actually recently blogged about work partitioning strategies in the TPL: http://reedcopsey.com/2010/01/26/parallelism-in-net-part-5-partitioning-of-work/
Reed Copsey