views:

155

answers:

2

Task:

Create a system which can produce large amount of tasks (independent pieces of work) in a minimal amount of time.

Solution:

The system consists of 2 parts: Dispatcher and Worker (WCF service). Dispatcher distributes work between multiple instances (1 - n) of Workers, physically located on several machines. It is needed to distribute work between Workers evenly.

One of the solutions - is to track performance counters on each Worker server and add additional thread if there are enough resources. Is that realistic solution? Can anybody provide good example?

Is there ready solutions? Note that is not needed to process all tasks right in the time they submitted, it is possible to queue them until resources are free. That is why NLB from Microsoft is not suitable.

UPD: Something very simular to what I am looking for is here

UPD2: Microsoft has StockTrader sample application (with sources), which is example of distributable SOA with hand-written RoundRobin load balancing.

+3  A: 

Since you are controlling the services (Workers) and client (Dispatcher), I would create a netTcp pub/sub service endpoint on each of the workers in addition to the current services, then the dispatcher could subscribe to each of the workers. The Dispatcher could round robin each call to the next available worker. When the worker is ready (or has available threads), the worker can notify the dispatcher to put it back in the queue. This would allow you to add more functionality, (notify when full, stopping, starting, ready, etc.).

Another benefit of this approach over your performance counter approach, is that the dispatcher would be instantly notified when a worker is available, instead of having to poll all the workers to find the next available worker. Polling can get really expensive and slow when you start adding more and more workers.

CkH
Thanks for a good advise to replace polling on notifying (+1). But it is still interesting how to identify count of available (most effective) threads? Is it possible to calculate using performance counters?
Yauheni Sivukha
I don't remember all the performance counters by heart, but you could turn on All performance counters by putting <diagnostics performanceCounters="All"/> in your config file inside the system.servicemodel section and you can take a look and see if there are any useful counters exposed for available threads.
CkH
I took a quick look at all of the perf counters exposed by servicemodel. Doesn't look like there is an exisiting counter for what you are lookin for. You could however, create your own perf counter monitoring ThreadPool.GetAvailableThreads() but again, you'll have to poll this on a timer or something like that to keep it updated. I stand by my original approach :)
CkH
The idea is to allow adding new thread when there are enough free resources (CPU + memory + disk usage + network load). See example here: http://www.codeproject.com/KB/mcpp/loadbalancing.aspx
Yauheni Sivukha
It's much easier to monitor all of these resources locally. Have the Workers Monitor their hosts and report back when conditions are optimal. Are you looking at how to monitor these processes?
CkH
Yes, I am looking for good examples (sources, documents) of load balancing systems (and its parts) based on performance counters.
Yauheni Sivukha
+1  A: 

In a similar environment, we found that having the dispatcher issue every command to the workers was a bit of a bottleneck. We elected to add logic to the workers which controls how many threads they have running each.

In order to kick off the workflow, we do have something like a dispatcher which connects to each worker service instance and tells it to go to a database (or some other resource) and look for work to do. When it finds work to do, it returns to the dispatcher a result in the form of an int (basically notifying the dispatcher of how many work items are still available after processing one successfully) this allows the dispatcher to immediately call that specific worker instance again instructing it to pop another work item off the queue.

The main benefit here is that when the queue gets low, a worker instance will find the few remaining work items available (eventually only one), and process one of them. When there is only one item remaining, the worker resource will return 0 (meaning there was one work item and it processed it) which instructs the dispatcher to exit its while loop and go back to sleep.

In our case, our dispatcher has a collection of timers that wake up and rotate through our workers to instruct them to check for new work available. If no work is available, the workers return "0" and the dispatcher is allowed to return to sleep for however long that particular timer interval is set.

fdfrye