views:

978

answers:

8

Hi,

Currently, I have a large number of C# computations (method calls) residing in a queue that will be run sequentially. Each computation will use some high-latency service (network, disk...).

I was going to use Mono coroutines to allow the next computation in the computation queue to continue while a previous computation is waiting for the high latency service to return. However, I prefer to not depend on Mono coroutines.

Is there a design pattern that's implementable in pure C# that will enable me to process additional computations while waiting for high latency services to return?

Thanks

Update:

I need to execute a huge number (>10000) of tasks, and each task will be using some high-latency service. On Windows, you can't create that much threads.

Update:

Basically, I need a design pattern that emulates the advantages (as follows) of tasklets in Stackless Python (http://www.stackless.com/)

  1. Huge # of tasks
  2. If a task blocks the next task in the queue executes
  3. No wasted cpu cycle
  4. Minimal overhead switching between tasks
+4  A: 

.NET 4.0 comes with extensive support for Task parallelism:

dtb
Yea that doesn't solve the problem of continuing the next computation when there's a latency-intensive operation. Task parallelism just makes parallel computing easier.
jameszhao00
the task parallel library is free to use more threads than cores, thus if it spots that the threads in use are not using much CPU time it can schedule more tasks... This can lead to excessive IO so needs careful tuning, it is hoped that the library does much of this for you but benchmarking and checking is always a good idea...
ShuggyCoUk
True threads are a bit too heavy-weight for this project. I need 20k-80k computation tasks running at one time.
jameszhao00
That's exactly what the task parallel library was designed for. Just fire off your tasks and let the library assign them to cores/threads.
dtb
Will the task parallelism library create new threads if existing threads are waiting?
jameszhao00
The library will create the optimal number of threads and permanently adjust the number to keep it optimal. So, yes, it will create new threads if this increases performance.
dtb
"The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available." From the Microsoft TPL documentation. I highly recommend looking into the TPL...it sounds like it is an ideal solution to your problem.
jrista
+1  A: 

Isn't this a conventional use of multi-threaded processing?

Have a look at patterns such as Reactor here

djna
Sorry. I'm a bit confused as to how that can be used here.
jameszhao00
+1  A: 

Writing it to use Async IO might be sufficient.

This can lead to nasy, hard to debug code without strong structure in the design.

ShuggyCoUk
On a lower layer, yes I will be using AsyncIO to send/recieve network packets. However, on the higher layers I will be implementing some sort of synchronous RPC.
jameszhao00
A: 

Some more information about the "Reactive" pattern (as mentioned by another poster) with respect to an implementation in .NET; aka "Linq to Events"

http://themechanicalbride.blogspot.com/2009/07/introducing-rx-linq-to-events.html

-Oisin

x0n
This looks like a "call dispatcher table" idea. I don't quite get how it can solve the issue at hand.
jameszhao00
+5  A: 

I'd recommend using the Thread Pool to execute multiple tasks from your queue at once in manageable batches using a list of active tasks that feeds off of the task queue.

In this scenario your main worker thread would initially pop N tasks from the queue into the active tasks list to be dispatched to the thread pool (most likely using QueueUserWorkItem), where N represents a manageable amount that won't overload the thread pool, bog your app down with thread scheduling and synchronization costs, or suck up available memory due to the combined I/O memory overhead of each task.

Whenever a task signals completion to the worker thread, you can remove it from the active tasks list and add the next one from your task queue to be executed.

This will allow you to have a rolling set of N tasks from your queue. You can manipulate N to affect the performance characteristics and find what is best in your particular circumstances.

Since you are ultimately bottlenecked by hardware operations (disk I/O and network I/O, CPU) I imagine smaller is better. Two thread pool tasks working on disk I/O most likely won't execute faster than one.

You could also implement flexibility in the size and contents of the active task list by restricting it to a set number of particular type of task. For example if you are running on a machine with 4 cores, you might find that the highest performing configuration is four CPU-bound tasks running concurrently along with one disk-bound task and a network task.

If you already have one task classified as a disk IO task, you may choose to wait until it is complete before adding another disk IO task, and you may choose to schedule a CPU-bound or network-bound task in the meanwhile.

Hope this makes sense!

PS: Do you have any dependancies on the order of tasks?

jscharf
No. There's no requirement on the order of execution.
jameszhao00
Let me see if I got this. For each core a specific number of threads will reside in a thread pool. Initially, a task is assigned to a thread and it executes. Each time a task blocks (I/O, ...), the task notifies/wakes the controller thread for that CPU core and the controller starts a new thread or wakes a previously sleeping one. This continues until all the tasks have been processed.
jameszhao00
You're a bit off but I think you have the gist of it. You should read the ThreadPool documentation (or Google for some tutorials using QueueUserWorkItem). There aren't really threads created for each core. Think of ThreadPool as an abstraction independant of cores. You simply throw multiple tasks at it that you need scheduled and executed concurrently whenever possible (which is usually).
jscharf
The solution you've given leads to 2 scenarios: it either creates a thread for each task, or it partitions the tasks among a few threads (where # of threads is close to the # of cores), and those threads executes those tasks assigned to it sequentially. The first scenario is not feasible since there will be way too many threads (# of tasks > 10000). In the second scenario, because the tasks in on thread are executed sequentially, a blocking operation will waste CPU cycles.
jameszhao00
The solution I posted avoids your first scenario. You will not be queuing all 10000+ tasks at once. You will be queuing a small amount (e.g. 5) at once to be dispatched to the ThreadPool. When a task on the pool is complete, it is removed from your 'active tasks' list and another one is added. By maintaining your own 'active tasks' list being executed by the pool, you are assured that there will never be a cumbersome amount of threads running.
jscharf
Furthermore, if you are utilizing asynchronous IO properly, your task threads should sleep while waiting for their latency-bound operations to complete, allowing effective thread scheduling for all active tasks.
jscharf
I think I forgot to mention this before, but almost always each task will be waiting on some high latency operation. In the scenario above, a massive number of sleeping threads will be waiting.
jameszhao00
@jscharf and jameszhao00, by default the ThreadPool contains 25 threads, way more than # cores but still reasonable. You could tune it. The main advantage is avoiding the overhead of creating (many) threads. The limit and the Queue manage workload, but the TPL does it more advanced.
Henk Holterman
+4  A: 

You can simulate cooperative microthreading using IEnumerable. Unfortunately this won't work with blocking APIs, so you need to find APIs that you can poll, or which have callbacks that you can use for signalling.

Consider a method

IEnumerable Thread ()
{
    //do some stuff
    Foo ();

    //co-operatively yield
    yield null;

    //do some more stuff
    Bar ();

    //sleep 2 seconds
    yield new TimeSpan (2000);
}

The C# compiler will unwrap this into a state machine - but the appearance is that of a co-operative microthread.

The pattern is quite straightforward. You implement a "scheduler" that keeps a list of all the active IEnumerators. As it cycles through the list, it "runs" each one using MoveNext (). If the value of MoveNext is false, the thread has ended, and the scheduler removes it from the list. If it's true, then the scheduler accesses the Current property to determine the current state of the thread. If it's a TimeSpan, the thread wishes to sleep, and the scheduler moved it onto some queue that can be flushed back into the main list when the sleep timespans have ended.

You can use other return objects to implement other signalling mechanisms. For example, define some kind of WaitHandle. If the thread yields one of these, it can be moved to a waiting queue until the handle is signalled. Or you could support WaitAll by yielding an array of wait handles. You could even implement priorities.

I did a simple implementation of this scheduler in about 150LOC but I haven't got round to blogging the code yet. It was for our PhyreSharp PhyreEngine wrapper (which won't be public), where it seems to work pretty well for controlling a couple of hundred characters in one of our demos. We borrowed the concept from the Unity3D engine -- they have some online docs that explain it from a user point of view.

mhutch
Interesting stuff. I looked at this before, but I don't think you can properly yield from code that's more than 1 level deep. I.e. if you have a coroutine start function that calls another function that yields, the whole thing breaks.
jameszhao00
In most cases, you can just implement the functions you want to call as ienumerables and foreach+yield over them, though of course you'd need a bit of indirection to get return values -- I can't remember whether ref params work with a yield, but *if* they don't, there are a bunch of other ways. You could pass in a reference to an object with fields and use that to get values back out of the function, or pass in lambdas that can set your locals, or filter a certain type from the yielded values, etc...
mhutch
I.e. Thing returned = null; foreach (var o in Function ((x) => returned = x)) yield return o;
mhutch
+2  A: 

You should definitely check out the Concurrency and Coordination Runtime. One of their samples describes exactly what you're talking about: you call out to long-latency services, and the CCR efficiently allows some other task to run while you wait. It can handle huge number of tasks because it doesn't need to spawn a thread for each one, though it will use all your cores if you ask it to.

redtuna