views:

480

answers:

2

When F# comes out, I am going to have an embarrassment of riches in the asynchronous/parallel programming area. An answer to this question does a pretty good job of describing the differences between Tasks, Parallel LINQ, and Reactive Framework, but I was wondering how asynchronous workflows fit into the picture, exactly.

Please correct me if I'm wrong, but as I understand it asynch workflows are going to be the easiest way to work with IO-bound operations, particularly those that have an AsynchXxx method defined, or follow the BeginXxx/EndXxx pattern. Another advantage is that asynch workflows are composable, and can be built out of other asynch workflows - which can allow for a great deal of flexibility in the way a program is structured.

I guess what I need help with is understanding under what circumstances I would choose Tasks or PLINQ over asynchronous workflows, in F# code. I believe I read that the Task Parallel Library has more sophisticated ways to balance the load across cores. If that's true, then Tasks might be a better choice for purely CPU-bound operations that need to operate in parallel. PLINQ, on the other hand, seems to be mainly a convenient way to parallelize existing code that works with sequences.

Finally, assuming that my understanding of the strengths of each approach is correct, is it ever possible or advisable to combine them? For example, perhaps one could compose a series of operations out of asynchronous workflows, and then transform them to Tasks prior to execution. If that's possible - or even a good idea.

+1  A: 

Async worksflows are implemented via the F# monadic syntax. This means that rather than transforming you're workflows into task, you could write your own version of "async" that was based on the parallel task library. I say this with a couple of caveats:

  • It would be difficult to do.
  • Async opertations that use the BeginXxx/EndXxx pattern in .NET register there callbacks in thread pool. I'm not sure you could change this, to redirect them use tasks instead.

For more details of how to implement a monad in F#, see the "Expert F#" book or google a bit on "F# monads".

Not a complete answer I know, but hope it helps a bit.

Robert
Okay, so no simple way to transition between async workflows and tasks. Good to know. Anyone have a comment on how to know when to use tasks instead of async workflows?
Joel Mueller
Actually there is away to transform between one and the other, see @kvb answer. My answers more illustration something you could do if you felt like experimenting.
Robert
+7  A: 

See http://stackoverflow.com/questions/1871168/f-task-parallel-library-vs-async-workflows.

I'd summarize the basics as follows:

Task Parallel Library: Allows multiple units of work to run efficiently on multiple cores, including relatively simple scenarios such as spawning multiple threads to do similar computations in parallel as well as more complicated operations where the computations themselves also end up spawning additional tasks. Uses the improved .NET 4.0 threadpool and work stealing queues to ensure that all cores are kept busy.

Async workflows: Allows asynchronous computations to run without occupying unneeded threads, initiating callbacks when results are available.

PLINQ: Code written using PLINQ ends up running via the TPL, but this is a nicer interface for code which is easily expressed using LINQ queries (e.g. doing a single operation on each item in an array of data in parallel).

Note that async workflows can be converted to Tasks using the StartAsTask method, and Tasks can be converted to Asyncs using the Async.AwaitTask method, so it's possible to bridge the technologies, although they do aim at slightly different target scenarios.

To me, the rule of thumb would be that if you're actively doing lots of computation on different threads, you'll want to use the TPL (possibly via PLINQ or an F# equivalent such as the PSeq module), whereas if you're trying to do lots of IO (whether parallel or not), you should use asynch workflows. So a raytracer would use TPL to kick off tasks to render each pixel (or scanline) in parallel, maximizing the available computing power on your computer. But downloading a bunch of webpages would be done with async workflows since there's not much computation to spread among cores; you just need to be notified by the OS when the results have come in.

kvb
Interesting, thank you. So if I had a sensible reason to download a bunch of web pages, and then kick off a ray tracer based on the data, would it be better to have an async step followed by a task step, or use `StartAsTask` to convert the async calls into a Task that does async calls and then spawns more tasks?
Joel Mueller
@Joel: It would probably be worth profiling each option to see which works better, but I wouldn't expect much difference either way. Eventually, all of the tasks would be running against the same threadpool, so I would think that whether they came from different `async` blocks wouldn't matter, in which case it's really a personal preference. At the same time, there could be some sort of subtle interaction I'm missing, so it's worth trying both ways just in case.
kvb