views:

194

answers:

3

I'm having some trouble putting together an algorithm for an asynchronous queue consumer thread, that is reading items off of a single queue that need to be dispatched to do some long running (several seconds at least) work.

Basically the queue can look as follows: A, A, A, A, A, B, B, A, B, A, A, A, A, A, C, B, A.

Ie. the A messages are far more common that other messages.

Our system has different concurrency values for each of the different message types, e.g. we can only execute 3 x A messages at once, but we can execute 5 x B and 4 x C messages at once.

My current (broken) algorithm is to have a single thread reading from the front of the queue and dispatching to a threadpool each job, with the body of each job waiting for enough resource to become available before executing the actual payload.

This means that if sufficient A messages arrive first, then they can "fill up" the thread pool's queue, and B+C messages are starved for much longer than necessary.

So far I've thought about having a separate thread pool for each message type (fairly low number of types), but I'm concerned about the efficiency of keeping that many threads around.

Any suggestions on how I can improve on this?

+4  A: 

Why not have your single dispatcher, dispatch to seperate queues which are then based on the type of message. So you would have 4 dispatchers total, the first one sends message to three other queues.

Then you have seperate queue readers which pull the messages off the queue based on their own rules.

JoshBerke
A: 

I'm not sure that having a seperate threadpool for each message type is that bad. You'll simply have to do it and see what happens.

As for alternatives, you could create a wrapper around threadpool and implement a priority queue (http://en.wikipedia.org/wiki/Priority_queue). This implicity will assign priority to certain messages. In your case, since C is the least common, it can always prioritize C higher. I think you get the point.

Craig Wilson
A: 

First off are the following assumptions valid?

  • It doesn't matter what order the jobs actually execute in.
  • The queue is just a mechanism for recording the jobs to undertake.
  • The jobs are all independant.
  • The are always multiple jobs waiting to be processed.

If this is the case then I think this is an example of a job shop scheduling problem. I think that these are usually modelled using a bin packing algorith - a google search on the above topics shoudl find lots of references.

It may well be that because your constraints are so simple a knapsack packing algorithm is more suitable that the bin packing, just do a google for knapsack problem.

Jackson
* Its preferable that the jobs within a specific class run in order.* The queue is yes just there to persist the jobs to be undertaken.* They are.* Not always, the queue could be empty.Have you got any specific links for bin packing that apply? I'm fairly sure I get the idea of that problem domain and I don't see how it relates. Thanks!
Kieran Benton
At any given time you have a set of jobs to run and an amount of resource to run them in. Each job has a cost and you want to run the best set of jobs you can to maximise the value of the running jobs without exceeding the resources available. The knapsack problem shows how to select a set of jobs so as to maximise the value.
Jackson