views:

84

answers:

1

I have around 100 computers and few workers on each of them. The already connect to a central database to query for job parameters.

Now I have to do job scheduling for them. One job for one worker takes few minutes, doesn't require network connection (except for dealing jobs and reporting) and can be done at any time in any order.

Constraints:

  • no job will be taken/done twice
  • there will be a timeeout in case the worker dies.

I thought that I could use a separate table in DB to schedule jobs. How to create and access job scheduling table?

+2  A: 

Break it down into pieces:

You have a job description - which may have some constraints on where its performed - and you have a queue of jobs to be performed, broadly speaking if there are no other constraints you'd expect the jobs to be done in order i.e. you take them from the front of the queue and add them to the end.

If we run the queue as a single table then jobs will have 3 states:

  • Not Started
  • In Progress
  • Completed

So looking for work to do is simply a matter of finding the first job (or the first n jobs if work is to be assigned in a batch) that are not started and then flagging them as started. If the constraints are more complex than simply the first n available jobs then it becomes the first n available jobs that meet the constraints, but it should still be fairly simple.

That implies the following fields in the queue table:

  • Status
  • DateQueued (date and time) for sort
  • DateStarted (date and time) for timeout
  • AssignedTo

One should probably add a DateCompleted - but if the work is batched that's not going to be strictly accurate (it'll be time reported as complete).

So for a worker (the worker "app") the process becomes:

  1. Connect to server
  2. Report completed work - set status and completion time
  3. Request new work
    1. Search for new work for worker (first n jobs not started that worker can do)
    2. Assign new work to worker (set status, Date Started and assigned to) - search and assign as a transaction.
  4. List work and disconnect

Separately you'd need processes to queue work, to look for jobs that have "timed out" so that the status can be reset and to archive or otherwise clear out completed jobs from the queue.

Full table would have the following plus any audit fields required.

  • ID
  • JobID -- Assuming that jobs are defined elsewhere
  • StatusID
  • DateQueued
  • DateStarted
  • AssignedToID
  • DateCompleted

Hope that helps...

Murph
Sounds about right. Bigg thanks for your work.One last thing is how to write a query to make appropriate things atomic?
Łukasz Lew
You have to wrap things up in transactions - exactly how will depend on both the server and the development tools you're using.
Murph