views:

197

answers:

3

I've been working on a comprehensive build system that performs distributed builds on multiple machines for quite some time now. It correctly handles dependencies and seemed to scale reasonably well, so we've added more projects and more machines, but it looks like it could perform better.

The problem I have is one of resource allocation. I have a list of available machines and a list of projects I'd like to build, also each machine lists what software, OS, compiler version, etc... is installed and each project lists what it requires. When work needs to be assigned, I can run a database query that lists the possible assignments. Now I need to perform those assignments as effectively as possible.

The smallest example is two projects 1 and 2 with two machines A and B. Machine A can build either project but machine B can only build project 1. So I end up with a list of pairs (A,1), (A,2), (B,1). If I process the assignments in order, machine A builds project 1 and I have to wait until it finishes before I can build project 2. It perhaps would have been better to assign machine A to project 2 and machine B to project 1. But... machine A may be much faster than machine B, and not using machine B at all may be the right answer.

I'm sure this is the sort of 'operational research' problem that's been addressed many times before. I don't necessarily need an optimal solution... just an attempt at something better than I have - it seems I often end up with tasks queued and machines idle which a better allocation could have avoided. Any suggestions most welcome.

A: 

At first thought, I'd recommend running a Windows Service on each machine, where one machine also runs the Master service to coordinate the assignments. The Master Service polls each machine for whether or not it is processing an assignment, and if it is not, then begin processing whichever assignment is in the queue that it is capable of processing.

hmcclungiii
You didn't answer the question...
derobert
From the last paragraph of his post - I don't necessarily need an optimal solution... just an attempt at something better than I have - it seems I often end up with tasks queued and machines idle which a better allocation could have avoided. Any suggestions most welcome.-I made an attempt.
hmcclungiii
Much appreciated. I've currently got a good way to assign 'a' task, but I do have some choice over which to assign - at the moment I am just as likely to make a good choice as a bad one.
Chris
+2  A: 

To get started, my preference is a "pull" model.

Each machine pulls tasks from the central server when it's idle.

The central server provides a kind of priority queue, with the packages in dependency order. Each machine makes a request from the central server and is allocated some work to do.

You have a kind of pooling model, where you have task classifications, and pools of machines that have matching classifications. Machines in pool 1, for example, can build certain things. Machines in pool 2 can build anything. Think of them as "skills" and you'll see how this is a kind of project management issue.

if you have really slow machines, you have to hand optimize them into a separate pool so they only get small branch things which have no dependencies.

That may be all you need. However, if you want further optimization, here's your next step.

After you have run it a few times -- and have some expectations for performance -- you can then write a module which attempts to keep each machine as busy as possible. This scheduling is precisely what things like Microsoft Project do.

Given tasks, with durations and dependencies, you are attempts to do "resource leveling". You want each resource (compile client in your case) as busy as possible consistent with each client's skill set and productivity.

S.Lott
Thanks for the ideas! I've flipped between pull and push a couple of times - either way, I end up with good machines taking trivial jobs while bad machines get the bottlenecks. "Keep each machine as busy as possible" is an easy metric to add. I'll give it a try - thanks!
Chris
+4  A: 

The problem you are trying to solve is equivalent to the classic Job Shop Scheduling problem. Finding an optimal schedule is NP-hard.

People have invented lots of heuristics to generate schedules, but which ones are good is highly problem-dependent.

A couple of common heuristics are:

  • Schedule the shortest task first.
  • Schedule the most highly constrained task first, e.g., pick the task that can run on the fewest machines first.
David Norman
I'd additionally suggest scheduling the machine(s) with the most uniqueabilities last.
derobert
NP hard and managerially impossible :)
leppie
I expected NP-hard, so perhaps it's fortunate I don't have to schedule the whole job shop (I might get new machines or jobs). I think I can try both heuristics reasonably quickly, thank you for the suggestions!
Chris
@derobert: "most unique abilities last" sounds like it contradicts "most highly constrained task first"... I'll try both.
Chris
They don't contradict. It just means if you can dispatch job 7 tomachine A/B/C (identical) or D (special), don't pick D.
derobert
@derobert: Gotcha - that makes sense! Two ways of looking at the question, either "find a machine to run this job" or "find work for the machine to do".
Chris