views:

398

answers:

1

A software application that I'm working on needs to be able to assign tasks to a group of users based on how many tasks they presently have, where the users with the fewest tasks are the most likely to get the next task. However, the current task load should be treated as a weighting, rather than an absolute order definition. IOW, I need to implement a weighted, load-balancing algorithm.

Let's say there are five users, with the following number of tasks:

A: 4 B: 5 C: 0 D: 7 E: 9

I want to prioritize the users for the next task in the order CABDE, where C is most likely to get the assignment and E, the least likely. There are two important things to note here:

  • The number of users can vary from 2 to dozens.
  • The number of tasks assigned to each user can vary from 1 to hundreds.

For now, we can treat all tasks as equal, though I wouldn't mind including task difficult as a variable that I can use in the future - but this is purely icing on the cake.

The ideas I've come up with so far aren't very good in some situations. They might weight users too closely together if there are a large number of users, or they might fall flat if a user has no current tasks, or....

I've tried poking around the web, but haven't had much luck. Can anyone give me a quick summary of an algorithm that would work well? I don't need an actual implementation--I'll do that part--just a good description. Alternative, is there a good web site that's freely accessible?

Also, while I certainly appreciate quality, this need not be statistically perfect. So if you can think of a good but not great technique, I'm interested!

+1  A: 

As you point out, this is a load-balancing problem. It's not really a scheduling problem, since you're not trying to minimise anything (total time, number of concurrent workers, etc.). There are no special constraints (job duration, time clashes, skill sets to match etc.) So really your problem boils down to selecting an appropriate weighting function.

You say there are some situations you want to avoid, like user weightings that are too close together. Can you provide more details? For example, what's wrong with making the chance of assignment just proportional to the current workload, normalised by the workload of the other workers? You can visualise this as a sequence of blocks of different lengths (the tasks), being packed into a set of bins (the workers), where you're trying to keep the total height of the bins as even as possible.

With more information, we could make specific recommendations of functions that could work for you.

Edit: example load-balancing functions

Based on your comments, here are some example of simple functions that can give you different balancing behaviour. A basic question is whether you want deterministic or probabilistic behaviour. I'll give a couple of examples of each.

To use the example in the question - there are 4 + 5 + 0 + 7 + 9 = 25 jobs currently assigned. You want to pick who gets job 26.

1) Simple task farm. For each job, always pick the worker with the least jobs currently pending. Fast workers get more to do, but everyone finishes at about the same time.

2) Guarantee fair workload. If workers work at different speeds, and you don't want some doing more than others, then track the number of completed + pending jobs for each worker. Assign the next job to keep this number evenly spread (fast workers get free breaks).

3) Basic linear normalisation. Pick a maximum number of jobs each worker can have. Each worker's workload is normalised to that number. For example, if the maximum number of jobs/worker is 15, then 50 more jobs can be added before you reach capacity. So for each worker the probability of being assigned the next job is

P(A) = (15 - 4)/50 = 0.22  
P(B) = (15 - 5)/50 = 0.2  
P(C) = (15 - 0)/50 = 0.3  
P(D) = (15 - 7)/50 = 0.16  
P(E) = (15 - 9)/50 = 0.12

If you don't want to use a specific maximum threshold, you could use the worker with the highest current number of pending jobs as the limit. In this case, that's worker E, so the probabilities would be

P(A) = (9 - 4)/20 = 0.25  
P(B) = (9 - 5)/20 = 0.2  
P(C) = (9 - 0)/20 = 0.45 
P(D) = (9 - 7)/20 = 0.1  
P(E) = (9 - 9)/20 = 0

Note that in this case, the normalisation ensures worker E can't be assigned any jobs - he's already at the limit. Also, just because C doesn't have anything to do doesn't mean he is guaranteed to be given a new job (it's just more likely).

You can easily implement the choice function by generating a random number r between 0 and 1 and comparing it to these boundaries. So if r is < 0.25, A gets the job, 0.25< r < 0.45, B gets the job, etc.

4) Non-linear normalisation. Using a log function (instead of the linear subtraction) to weight your numbers is an easy way to get a non-linear normalisation. You can use this to skew the probabilities, e.g. to make it much more likely that workers without many jobs are given more.

The point is, the number of ways of doing this are practically unlimited. What weighting function you use depends on the specific behaviour you're trying to enable. Hopefully that's given you some ideas which you can use as a starting point.

ire_and_curses
I'm fine with that approach; I'm just not just not sure of a good normalization technique to use. I tried a handful of methods (don't remember exactly, sorry - but things like using the mean workload of all workers, the difference in workload between a worker and the highest workload, etc.), but I didn't like the numbers across varying sets of data. For example, if I have 50 workers with loads up to 15, I might end up with little difference between a load of 1 and a load of 15.I'm sure this is simple and I'm just not seeing the solution. My stats and OR classes were a long time ago! :-)
@unknown (google): Ok. I've added a few specific examples to my answer. Hope this is helpful!
ire_and_curses
Very helpful, thanks. Quick question, though: how did you come up with the value 20 in this example:P(A) = (9 - 4)/20 = 0.25P(B) = (9 - 5)/20 = 0.2[...]
It's the number of free job slots available (9 is the maximum in that example): (9-4) + (9-5) + (9-0) + (9-7) + (9-9) = 20
ire_and_curses
Most excellent; thanks!