views:

122

answers:

3

The current setup goes something like this

array
|> Seq.map (fun item -> async { return f item})
|> Async.Parallel
|> Async.RunSynchronously

The problem is, this tends to create too many threads and crash the application periodically.

How to limit the number of threads in this case (to, say, Environment.ProcessorCount)?

A: 

There are a couple things you might do.

First, since this uses the ThreadPool, you can use ThreadPool.SetMaxThreads.

Second, you could introduce your own throttle along these lines:

let throttle = makeThrottle(8)
array 
|> Seq.map (fun item -> async { do! throttle.Wait()
                                return f item}) 
|> Async.Parallel 
|> Async.RunSynchronously 

makeThrottle() would not be too hard to write, but it would incur a little synchronization overhead. If you are trying to parallelize so many things that you're running out of memory, the throttle overhead is likely to be a non-issue. (Let me know if you need a sample for this kind of code.)

Finally, if this is really crashing things, it smells like you may be doing something wrong. The ThreadPool typically (but not always) does a good job managing itself. But in various circumstances, designing your own throttle may be valuable to your app anyway.

Brian
+2  A: 

If you want to parallelize CPU-intensive calculation that takes an array (or any sequence) as an input, then it may be a better idea to use PSeq module from the F# PowerPack (which is available only on .NET 4.0 though). It provides a parallel versions of many standard Array.xyz functions. For more information, you can also look at F# translation of Parallel Programming with .NET samples.

The code to solve your problem would be a bit simpler than using workflows:

array |> PSeq.map f
      |> PSeq.toArray 

Some differences between the two options are:

  • PSeq is created using Task Parallel Library (TPL) from .NET 4.0, which is optimized for working with a large number of CPU-intensive tasks.
  • Async is implemented in F# libraries and supports asynchronous (non-blocking) operations such as I/O in the concurrently running operations.

In summary, if you need asynchronous operations (e.g. I/O) then Async is the best option. If you have a large number of CPU-intensive tasks, then PSeq may be a better choice (on .NET 4.0)

Tomas Petricek
We have solved it differently, but this is a good answer. Unfortunately, we can't use .NET 4.0.
Alexander
A: 

Do:

Array.Parallel.map f array

instead.

Jon Harrop