views:

462

answers:

2

Parallel.ForEach Not Spinning Up New Threads

Hello all, we have a very IO-intensive operation that we wrote using Parallel.ForEach from Microsoft's Parallel Extensions for the .NET Framework. We need to delete a large number of files, and we represent the files to be deleted as a list of lists. Each nested list has 1000 messages in it, and we have 50 of these lists. The issue here is that when I look in the logs afterwards, I only see one thread executing inside of our Parallel.ForEach block.

Here's what the code looks like:

List<List<Message>> expiredMessagesLists = GetNestedListOfMessages();
        foreach (List<Message> subList in expiredMessagesLists)
        {
            Parallel.ForEach(subList, msg =>
            {
                try
                {
                    Logger.LogEvent(TraceEventType.Information, "Purging Message {0} on Thread {1}", msg.MessageID, msg.ExtensionID, Thread.CurrentThread.Name);

                    DeleteMessageFiles(msg);
                }
                catch (Exception ex)
                {
                    Logger.LogException(TraceEventType.Error, ex);
                }
            });
        }

I wrote some sample code with a simpler data structure and no IO logic, and I could see several different threads executing within the Parallel.ForEach block. Are we doing something incorrect with Parallel.ForEach in the code above? Could it be the list of lists that's tripping it up, or is there some sort of threading limitation for IO operations?

A: 

The assumption underlying your code is that it is possible to delete files in parallel. I'm not saying it isn't (I'm no expert on the matter), but I wouldn't be surprised if that is simply not possible for most hardware. You are, after all, performing an operation with a physical object (your hard disk) when you do this.

Suppose you had a class, Person, with a method called RaiseArm(). You could always try shooting off RaiseArm() on 100 different threads, but the Person is only ever going to be able to raise two at a time...

Like I said, I could be wrong. This is just my suspicion.

Dan Tao
+3  A: 

There are a couple of possibilities.

First off, in most cases, Parallel.ForEach will not spawn a new thread. It uses the .NET 4 ThreadPool (all of the TPL does), and will reuse ThreadPool threads.

That being said, Parallel.ForEach uses a partitioning strategy based on the size of the List being passed to it. My first guess is that your "outer" list has many messages, but the inner list only has one Message instance, so the ForEach partitioner is only using a single thread. With one element, Parallel is smart enough to just use the main thread, and not spin work onto a background thread.

Normally, in situations like this, it's better to parallelize the outer loop, not the inner loop. That will usually give you better performance (since you'll have larger work items), although it's difficult to know without having a good sense of the loop sizes plus the size of the Unit of Work. You could also, potentially, parallelize both the inner and outer loops, but without profiling, it'd be difficult to tell what would be the best option.

One other possibility:

Try using [Thread.ManagedThreadId][1] instead of Thread.CurrentThread.Name for your logging. Since Parallel uses ThreadPool threads, the "Name" is often identical across multiple threads. You may think you're only using a single thread, when you're in fact using more than one....

Reed Copsey