



The current setup goes something like this

|> (fun item -> async { return f item})
|> Async.Parallel
|> Async.RunSynchronously

The problem is, this tends to create too many threads and crash the application periodically.

How to limit the number of threads in this case (to, say, Environment.ProcessorCount)?


There are a couple things you might do.

First, since this uses the ThreadPool, you can use ThreadPool.SetMaxThreads.

Second, you could introduce your own throttle along these lines:

let throttle = makeThrottle(8)
|> (fun item -> async { do! throttle.Wait()
                                return f item}) 
|> Async.Parallel 
|> Async.RunSynchronously 

makeThrottle() would not be too hard to write, but it would incur a little synchronization overhead. If you are trying to parallelize so many things that you're running out of memory, the throttle overhead is likely to be a non-issue. (Let me know if you need a sample for this kind of code.)

Finally, if this is really crashing things, it smells like you may be doing something wrong. The ThreadPool typically (but not always) does a good job managing itself. But in various circumstances, designing your own throttle may be valuable to your app anyway.

+2  A: 

If you want to parallelize CPU-intensive calculation that takes an array (or any sequence) as an input, then it may be a better idea to use PSeq module from the F# PowerPack (which is available only on .NET 4.0 though). It provides a parallel versions of many standard functions. For more information, you can also look at F# translation of Parallel Programming with .NET samples.

The code to solve your problem would be a bit simpler than using workflows:

array |> f
      |> PSeq.toArray 

Some differences between the two options are:

  • PSeq is created using Task Parallel Library (TPL) from .NET 4.0, which is optimized for working with a large number of CPU-intensive tasks.
  • Async is implemented in F# libraries and supports asynchronous (non-blocking) operations such as I/O in the concurrently running operations.

In summary, if you need asynchronous operations (e.g. I/O) then Async is the best option. If you have a large number of CPU-intensive tasks, then PSeq may be a better choice (on .NET 4.0)

Tomas Petricek
We have solved it differently, but this is a good answer. Unfortunately, we can't use .NET 4.0.

Do: f array


Jon Harrop