views:

145

answers:

1

I've been investigating different scheduling algorithms for a thread pool I am implementing. Due to the nature of the problem I am solving I can assume that the tasks being run in parallel are independent and do not spawn any new tasks. The tasks can be of varying sizes.

I went immediately for the most popular scheduling algorithm "work stealing" using lock-free deques for the local job queues, and I am relatively happy with this approach. However I'm wondering whether there are any common cases where work-stealing is not the best approach.

For this particular problem I have a good estimate of the size of each individual task. Work-stealing does not make use of this information and I'm wondering if there is any scheduler which will give better load-balancing than work-stealing with this information (obviously with the same efficiency).

NB. This question ties up with a previous question.

+2  A: 

I'd distribute the tasks upfront. With the information of their estimated running time you can distribute them into individual queues, for each thread one.

Distributing the tasks is basically the knapsack problem, each queue should take the same amount of time.

You should add some logic to modify the queues while they run. For example a re-distribution should occur after the estimated running time differs by a certain amount from the real running time.

Georg
The other complication, which might or might not apply, is how to handle a change of availability of parts of the underlying parallel execution framework. That doesn't matter so much when you're running on a small dedicated host (or cluster) but with large clusters you can't rely on nodes staying up. The easiest fix is to rerun the scheduler if that changes, and to remember to reschedule workloads that have started running on the failed nodes too; it's better to do work twice than to lose it.
Donal Fellows
@Donal: Good point about node availability, although in this case (threads in a single process) I won't have to think about it much.@Georg: This is what I was considering. In terms of locking and CAS calls, just distributing tasks upfront should be cheaper. What I am worried about it the quality of the actual load balance.
Il-Bhima
Nice when you don't have that worry. :-) Couldn't tell on the basis of info given though, so thanks for the clarification. (I know people who have worked on the more general problem for their PhD, and it's really hard. Especially if you have prioritization and fixed reservations in the mix too.)
Donal Fellows