ansaurus

Question

Parallel.For System.OutOfMemoryException

Answer 1

+1 A:

Making a few guesses here as I haven't yet had feedback from the comment to your question.

I am guessing that the large amount of worker threads is happening here as actions (an action being the unit of work carried out on the parallel foreach) are taking longer than a specified amount of time, so the underlying ThreadPool is growing the number of threads. This will happen as the ThreadPool follows an algorithm of growing the pool so that new tasks are not blocked by existing long running tasks e.g. if all my current threads have been busy for half a second, I'll start adding more threads to the pool. However, you are going to get into trouble if all tasks are long-running and new tasks that you add are going to make existing tasks run even longer. This is why you are probably seeing a large number of worker threads - possibly because of disk thrashing or slow network IO (if networked drives are involved).

I am also guessing that files are being copied from one disk to another, or they are being copied from one location to another on the same disk. In this case, adding threads to the problem is not going to help out much. The source and destination disks only have one set of heads, so trying to make them do multiple things at once is likely to actually slow things down:

The disk heads will be lurching all over the place.
Your disk\OS caches may be frequently invalidated.

This may not be a great problem for parallelization.

Update

In answer to your comment, if you are getting a speed-up using multiple threads on smaller datasets, then you could experiment with lowering the maximum number of threads used in your parallel foreach, e.g.

ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.ForEach(Directory.GetFiles(Src), options, file =>
{
    //Do stuff
});

But please do bear in mind that disk thrashing may negate any benefits from parallelization in the general case. Play about with it and measure your results.

chibacity 2010-06-06 21:57:23

Thanks for the insight @chibacity. I ran some tests on the smaller directories (network drives) and they seems to speed things up from 30 seconds to ~5-6 seconds. I was hoping that when I set it on the huge 6GB dir, I would see even half that speed up. Your logic makes sense though.

Martin Neal 2010-06-06 22:18:28

There might be a bit more mileage in the approach, although disk thrashing may negate any benefits of parallelizm in the general case. I have added an update to my answer.

chibacity 2010-06-06 22:31:12

ansaurus

tags:

views:

answers:

Parallel.For System.OutOfMemoryException

related questions