Heuristic algorithm for load balancing among threads.

views:

208

answers:

+6 Q:

Heuristic algorithm for load balancing among threads.

I'm working on a multi-threaded program where I have a number of worker threads performing tasks of unequal length. I want to load-balance the tasks to ensure that they do roughly the same amount of work. For each task T_i I have a number c_i which provides a good approximation to the amount of work that is required for that task.

I'm looking for an efficient (O(N) N = number of tasks or better) algorithm which will give me "roughly" a good load balance given the values of c_i. It doesn't have to be optimal, but I would like to be able to have some theoretical bounds on how bad the resulting allocations are.

Any ideas?

+1 A:

In O(N) this seems easy.

Give each thread some "points". Let p_i the points allocated to thread T_i. For each task, choose the thread with the highest p_i, and subtract the task cost from p_i. You then just need to keep track of the threads ordered by score, which is trivial in O(N) time, and can easily be done in O(log N) with a balanced tree.

For continuous operation, there is no minimum in p_i. If you want to avoid scores going rogue towards -inf, just regularly add an arbitrary amount P to all scores (the same amount for all scores).

Edit: I got the wrong N. Above, N is the number of threads, contrary to the question asked. With N = number of tasks, and T = number of threads, this leads to O(N*log T) cost. If T is "small", this is close to O(N).

Edit 2: If all tasks are known in advance, as well as the number of threads, then I think that computing the optimal scheduling is akin to the knapsack problem and it is, in all generality, NP-complete (so you will get exponentials somewhere). A simple cost-based analysis as I somehow describe above will give you a relatively good approximation as long as all individual tasks have a small cost with regards to the total cost to be allocated to each thread.

Thomas Pornin 2010-03-15 12:53:05

This sounds interesting and surprisingly trivial. I'll think about it and get back to you.

Il-Bhima 2010-03-15 13:02:17

+1 A:

I'd have a look at algorithms for load balancing e.g.

http://www.ieee802.org/3/trunk_study/february98/congdon.pdf

http://publib.boulder.ibm.com/infocenter/cicstg/v6r0m0/index.jsp?topic=/com.ibm.cicstg600.doc/ccllal0292.htm

Rob Stevenson-Leggett 2010-03-15 12:53:10

+2 A:

Simplest way is to sort jobs descending by p_i (but this is O(n log n)) and do this:

For each thread we have est. run time e_n = 0.
For each task i find thread that has minimal e_n enque task on it and e_n = e_n + p_i.

This algorithm should give you best results but with O(N*M) time where N is number of tasks and M number of threads. Total cost of solution is O(N log N + N*M), so for M << N is O(N log N) and for M near N is O(n^2).

Migol 2010-03-15 13:03:04

+2 A:

My inclination is not to try to figure out ahead of time how to assign the tasks, but to throw them all into a common work queue. Any worker thread with nothing else to do grabs the next task from the queue does the work and checks the queue for the next task.

Adrian McCarthy 2010-03-15 13:25:21

+1 but you might get heavy lock contention on the shared task pool if you have many threads. The system should be tuned to make sure that threads are not constantly waiting for a lock to grab a new task. This can be achieved by making the tasks big enough or by letting threads grab more than 1 task at a time. The ParallelFx library goes even further by having both global and local work pools, and adding "work stealing" into the mix: http://en.wikipedia.org/wiki/Parallel_Extensions

Wim Coenen 2010-03-15 14:15:57

This is exactly what I'm doing now, but having one thread execution per task would incur some non-trivial overheads caused by the threads terminating and being re-assigned tasks. If I find no faster solution this is what I'll go with, but I'm basically trying to find a way to assign > 1 task to each thread beforehand

Il-Bhima 2010-03-15 14:17:59

@Wim: Whether you have contention depends, in part, on the locking primitives you use (and is still likely to be far less expensive than trying to schedule tasks to particular threads). If you use a semaphore whose count is the number of tasks in the queue, you wake up just enough threads. You could use a lock-free queue as well. If you have LOTS of threads and LOTS of tasks, you could use n queues to reduce contention and just assign tasks to queues in a round-robin fashion.

Adrian McCarthy 2010-03-15 17:13:35

@Il-Bhima: Starting a thread for each task is certainly a lot of overhead. That's why I have a fixed pool of threads that keep going back to the queue for another task.

Adrian McCarthy 2010-03-15 17:14:36

@Adrian: Yes what I meant was that I have a thread pool which blocks on a counting semaphore and each threads picks up another job as soon as it finishes. You're really making me wonder whether any scheduling algorithm will be much better than doing what you're saying.

Il-Bhima 2010-03-15 21:42:30

+1 A:

While the suggestion regarding the knapsack problem is helpful, you said that you are trying to minimize the net time of execution. Taking the knapsack approach would require you to keep increasing your knapsack's sizes until you get a feasible solution - not very efficient.

If the net time of execution is limited by the longest completion time among all threads working in parallel, I want to assign tasks so I MINIMIZE the MAXIMUM work time across all threads. Doing this may result in one or more threads that don't do a lot of work, so we're not really "balancing" the work. If you want to balance the work, then that's a different objective function. For instance, you might want to minimize the variance in work among threads.

Look in the area of job shop scheduling. If you only do this infrequently, I'd suggest using a genetic algorithm - if you have to do it frequently and in a more automated manner, I'd suggest doing a little literature searches for deterministic algorithms. Hope this helps.

Grembo 2010-03-15 19:40:40

+2 A:

You want to implement a work stealing algorithm. Each worker thread has a double ended queue, new tasks are added to the bottom of the smallest queue. Workers remove tasks from the top of their own queue (the top/bottom separation reduces contention), when a worker has no more jobs to do, it steals a job off the bottom of the largest queue. It's simple, and works well, this is the algorithm which the Microsoft parallel system coming with .net4.0 is based on I believe.

The resulting allocation is pretty good, worker threads will only be left with no work to do if there are no more jobs available in the entire system.

Nb. If you want some example code to tear apart, my friend wrote a work stealing system for C#, which you can find here

Martin 2010-03-15 20:53:04

This is the solution I went for. Am currently considering migrating my code to Cilk which provides a work-stealing scheduler.

Il-Bhima 2010-03-18 21:05:36

wow, that looks like a pretty interesting language. Glad I could help :)

Martin 2010-03-18 21:09:00

ansaurus

tags:

views:

answers:

Heuristic algorithm for load balancing among threads.

related questions