views:

258

answers:

6

I'm wondering what kind(s) of data structures / algorithms might help facilitate handling the following situation; I'm not sure if I need a single FIFO, or a priority queue, or multiple FIFOs.

I have N objects that must proceed through a predefined workflow. Each object must complete step 1, then step 2, then step 3, then step 4, etc. Each step is either done quickly or involves a "wait" that depends on something external to finish (like the completion of a file operation or whatever). Each object maintains its own state. If I had to define an interface for these objects, it would be something like this (written below in pseudo-Java, but this question is language-agnostic):

public interface TaskObject
{
   public enum State { READY, WAITING, DONE };
   // READY = ready to execute next step
   // WAITING = awaiting some external condition
   // DONE = finished all steps

   public int getCurrentStep();
   // returns # of current step

   public int getEndStep();
   // returns # of step which is the DONE case.

   public State getState();
   // checks state and returns it. 
   // multiple calls will always be identical, 
   // except WAITING which can transition to READY or DONE.

   public State executeStep();
   // if READY, executes next step and returns getState().
   // otherwise, returns getState().

}

I need to write a single-threaded scheduler that calls executeStep() on the "next" object. My problem is, I'm not sure exactly what technique I should use to determine what the "next" object is. I want it to be fair (first-come, first-serve for objects not in the WAITING state).

My gut call is to have 3 FIFOs, READY, WAITING and DONE. In the beginning all objects are placed in the READY queue, and the scheduler repeats a loop where it takes the first object off the READY queue, calls executeStep(), and places it onto the queue that's appropriate the the result of executeStep(). Except that items in the WAITING queue need to be put into the READY or DONE queue when their state changes.... argh!

Any advice?

A: 

NOTE - this does not address your question of how to schedule, but I would use a separate state class that defines the states and transitions. The objects should not know what states they should go through. They can be informed of what "Step" they are at, etc.

there are some patterns for that as well.

You should read up a little on operating systems - specifically the scheduler. Your example is a scaled down set of that problem and if you copy the relevant parts it should work great for you.

You can then add priority, etc.

Tim
Just to add, you mean the State pattern From GoF, right?
xandy
@xandy: That works. I have seen other similar implementations that fall under the description of a state pattern.
Tim
+1  A: 

You don't have any way for the task object to notify you when it changes from WAITING to READY except polling it, so the WAITING and READY queues could really just be one. You can just loop around it calling executeStep() on each one in turn. If as a return value from executeStep() you receive DONE, then you remove it from that queue and stick it on the DONE queue and forget about it.

If you wanted to give "more priority" towards READY objects and attempt to run through all possible READY objects before wasting any resources polling WAITING you can maintain 3 queues like you said and only process the WAITING queue when you have nothing in the READY queue.

I personally would spend some effort to eliminate the polling of the state, and instead define an interface that the object could use to notify your scheduler when a state changes.

Greg Rogers
+1  A: 

If this has to be single threaded you can use a single FIFO queue for the ready and waiting objects and use your thread to process each object as it comes out. If it's state changes to WAITING then simply stick it back into the queue and it will be reprocessed.

Something like (psuedocode):

var item = queue.getNextItem();
var state = item.executeStep ();
if (state == WAITING)
    queue.AddItem (item);
else if (state == DONE)
    // add to collection of done objects

Depending on the time executeStep takes to run you may need to introduce a delay (Sleep not for) to prevent a tight polling loop. Ideally you would have the objects publish state change events and do-away with the polling altogether.

This is the kind of timeslicing approach that was commonplace in hardware and comms software before multithreading was widespread.

Alex
This worked very easily, I just had a fairly simple PHP application that I needed to manage some parallel tasks w/o multithreading. Thanks!
Jason S
A: 

The simplest technique that satisfies the requirements in your question is to repeatedly iterate over all TaskObjects calling executeStep() on each one.

This requires only one construct to hold the TaskObjects, and it can be any iterable structure, e.g. an array.

Since a TaskObject can transition from WAITING to READY asynchronously, you have to poll every TaskObject that you don't know is DONE.

The performance gained from not polling the DONE TaskObjects may be negligible. It depends on the processing load of calling executeStep() on a DONE TaskObject, which should be small.

A simple round-robin polling assures that once a READY TaskObject has executed a step, it will not execute another step until all other TaskObjects have had a chance to execute.

One obvious additional requirement is detecting when all TaskObjects are in the DONE state so you can stop processing.

To avoid polling DONE TaskObjects you will need to either maintain a flag for each one, or chain the TaskObjects in two queues: READY/WAITING and DONE.

If you store the TaskObjects in an array, make it an array of records, with members DoneFlag and TaskObject.

If for some reason you are storing the TaskObjects in a queue, with available enqueue() and dequeue() methods, then the overhead of two queues instead of one may be small.

-Al.

A. I. Breveleri
+1  A: 

You might want to study the design of an operating system scheduler. Check out the Linux and *BSD for example.

Some pointers for the Linux scheduler: Inside the Linux scheduler and Understanding the Linux Kernel

florin
A: 

Take a look a this link.

Boost state machines vs uml

Boost has state machines. Why reinvent?

MadH
thanks, but this was just a simple application in PHP.
Jason S
oh, posts with php tag are usually ignored for me
MadH