views:

839

answers:

10

I am currently faced with a difficult sorting problem. I have a collection of events that need to be sorted against each other (a comparison sort) and against their relative position in the list.

In the simplest terms I have list of events that each have a priority (integer), a duration (seconds), and an earliest occurrence time that the event can appear in the list. I need to sort the events based on priority, but no event can appear in the list before its earliest occurrence time. Here's an example to (hopefully) make it clearer:

// Psuedo C# code
class Event { int priority; double duration; double earliestTime ; }

void Example()
{
    Event a = new Event { priority = 1, duration = 4.0, earliestTime = 0.0 };
    Event b = new Event { priority = 2, duration = 5.0, earliestTime = 6.0 };
    Event c = new Event { priority = 3, duration = 3.0, earliestTime = 0.0 };
    Event d = new Event { priority = 4, duration = 2.0, earliestTime = 0.0 };

    // assume list starts at 0.0 seconds
    List<Event> results = Sort( new List<Event> { a, b, c, d } );

    assert( results[ 0 ] == a ); // 4.0 seconds elapsed
    assert( results[ 1 ] == c ); // 7.0 seconds elapsed
    assert( results[ 2 ] == b ); // 12.0 seconds elapsed
    assert( results[ 3 ] == d ); // 14.0 seconds elapsed
}

Item "b" has to come last because it isn't allowed to start until 6.0 seconds into the list, so it is deferred and "c" gets to go before "b" even though its priority is lower. (Hopefully the above explains my problem, if not let me know and I'll edit it.)

My current idea is to use an insertion sort to manage the sorting process. Unlike many of the other common sorting algorithms, insertion sort decides the order of the list one at a time and in order. So for each index I should be able to find the next lowest priority event whose earliest occurrence time will be satisfied.

I'm hoping to find resources about sorting algorithms and data structures to help me design a good solution for this "sort" of problem. My real problem is actually more complex than this: hierarchical sorting, variable buffers between events, multiple non-constant time constraints, so the more information or ideas the better. Speed and space are not really a concern. Accuracy in sorting and maintainability of the code are a concern.

Edit: Clarifications (based on comments)

  • Events consume their entire duration (that is there is no overlap of events allowed)
  • Events must occur at or after their earliestTime, they cannot occur before their earliestTime.
  • Events can occur later than their earliestTime if lower priority events exist
  • Events cannot be interrupted
  • There is a maximum duration the sum of all events that can fit in a list. This is not shown above. (In reality the duration of all events will be far greater than the time list's maximum duration.)
  • There cannot be any gaps. (There are no holes to try and back fill.)

Edit: Answer

While David Nehme gave the answer I selected, I wanted to point out that his answer is an insertion sorts at heart, and several other people provided insertions sort type answers. This confirms for me that a specialized insertion sort is probably the way to go. Thanks to all of you for your answers.

A: 

I think you should sort the list twice: first by priority and then by earliest time, using any stable sort algorithm, for instance insertion sort. That way the time will be increasing and for each time things will be sorted by priority.

Unless you see something I don't you can completely ignore the duration of each event for the purpose of the sort.

http://en.wikipedia.org/wiki/Category:Stable_sorts

jakber
I can't ignore the duration, as the current time increase as I move forward in time, different events can be added to the list.
ARKBAN
So events cannot happen in parallell?
jakber
I guess not, so if you have lots of low priority tasks that can start at 0, but one high priority task that must start a t=4, you can only schedule a few of the low priority tasks first.
Douglas Leeder
+2  A: 

In other words, you want to optimize the overall running time while formulating two constraints (strong: earliest point of execution, weak: priority)? This is called a constraint satisfaction problem. There are special solvers for this kind of problem.

Incidentally, jakber's solution doesn't work. Even without the duration, the following example obviously fails:

event a (priority = 1, start = 5)
event b (priority = 2, start = 0)

The sorted sequence would be a, b while the wanted outcome is surely b, a.

Konrad Rudolph
Could you elaborate on the doesn't work part?
jakber
Thanks, but the second sort would sort by start-time in my solution, thereby producing b,a. If event cannot happen in parallell my solution breaks down however so I guess you have a point anyway.
jakber
+7  A: 

This is actually more than a sorting problem. It's a single-machine scheduling problem with release dates. Depending on what you are trying to do, the problem might be NP-Hard. For example, if you are trying to mimimize the weighted-sum of the completion times (the weight being inversely proportional to the priority), the the problem is categorized as

1|ri;pmtn|Σ wiCi

and is NP-hard. There are numerous papers on this topic, but it might be more than what you need.

In your case, you never want a solution with gaps, so what you might just need to do is a simple discrete-event simulation ( O(n log(n)) ) time. You need to store released_jobs as a priority queue.

unreleased_jobs = jobs  // sorted list of jobs, by release date
released_jobs = {}      // priority queue of jobs, by priority
scheduled_jobs = {}     // simple list
while (!unreleased_jobs.empty() || !released_jobs.empty()) {

    while (unreleased_jobs.top().earliestTime  <= t) {
        released_jobs.push(unreleased_jobs.pop())
    }
    if (!released_jobs.empty()) {
       next_job = released_jobs.pop();
       scheduled_jobs.push_back(next_job)
       t = t + next_job.duration
    } else {
       // we have a gap
       t = unreleased_jobs.top().earliestTime
    }
}

One problem is that you might have a low-priority job with a release time just before a short, high-priority job, but it will produce a schedule with the property that there are no gaps (if a schedule with no gaps is possible).

David Nehme
What do you mean by "single-machine"? As opposed to a distributed computing type problem?
ARKBAN
I mean this is akin to scheduling jobs on a single processor. I'm not referring to the code you will write to solve this.
David Nehme
I suspect you're right. But it might depend in the weighting is always to start high-priority as soon as possible (my answer), or always avoid gaps, which might allow a different linear solution.
Douglas Leeder
Or maybe not linear, but at least P.
Douglas Leeder
Can you recommend any resources for handling such a problem? (Someone else posted a link to the Constraint Satisfaction Problem page on Wikipedia.)
ARKBAN
A: 

It sounds like you do actually do want a comparison-based sort. Your sort key is {earliestTime, priority}, in that order. Since your example is pseudo-C#, I'll give you a pseudo-C# solution:

class Event : IComparable<Event>, IComparable{
    int    priority;
    double duration;
    double earliestTime;

    public int CompareTo(Event other){
        if(other == null)
            return 1; /* define: non-null > null */

        int cmp = earliestTime.CompareTo(other.earliestTime);
        if(cmp != 0)
            return cmp;

        /* earliestTimes were equal, so move on to next comparison */
        return priority.CompareTo(other.priority);
    }

   int IComparable.CompareTo(object other){ /* for compatibility with non-generic collections */
       if(other == null)
            return 1; /* define: non-null > null */

       Event e_other = other as Event;
       if(e_other == null) /* must have been some other type */
           throw new ArgumentException("Must be an Event", "other");

       return CompareTo(e_other); /* forward to strongly-typed implementation */
   }
}

Now your list will sort just as your asserts expect.

EDIT:

My initial presumption was that events would be picked off the list and handed off to a separate thread, so that the queue manager could fire off the next event on time, but from comments I received, I got the idea that perhaps an approach that was single-threaded, yet still allowed higher-priority events to fire off as close as possible to their start time was more desirable. In that case, the CompareTo function should change as follows:

public int CompareTo(Event other){
    if(other == null)
        return 1; /* define: non-null > null */

    int cmp = priority.CompareTo(other.priority);

    if(cmp == 0)
        /*
         * calculate and compare the time each event will be late
         * if the other one were to start first.  This time may be
         * negative if starting one will not make the other one late
         */
        return (earliestTime + duration - other.earliestTime).CompareTo(
            other.earliestTime + other.duration - earliestTime);

    /*
     * they're different priorities. if the lower-priority event
     * (presume that greater priority index means lower priority,
     * e.g. priority 4 is "lower" priority than priority 1), would
     * would make the higher-priority event late, then order the
     * higher-priority one first.  Otherwise, just order them by
     * earliestTime.
     */
    if(cmp < 0){/* this one is higher priority */
        if(earliestTime <= other.earliestTime)
            /* this one must start first */
            return -1;

        if(other.earliestTime + other.duration <= earliestTime)
            /* the lower-priority event would not make this one late */
            return 1;

        return -1;
    }

    /* this one is lower priority */
    if(other.earliestTime <= earliestTime)
        /* the other one must start first */
        return 1;

    if(earliestTime + duration <= other.earliestTime)
        /* this event will not make the higher-priority one late */
        return -1;

    return 1;
}

Test this against any assumptions, but I think it's what we're looking for.

P Daddy
That's not going to work if you've got lots of low priority tasks that'll delay a high priority task that can only start later.
Douglas Leeder
It cannot be a straight comparison-based sort. You can't decide whether Event1 < Event2 or Event2 < Event1 if you don't know what comes before Event1 and Event2.
Federico Ramponi
I think you're both referring to the duration of lower priority earlier events delaying the start of higher priority later events. I had presumed a multi-threaded approach to fire off events at their due time (no matter what was already running), but I'll edit to suppose a single-threaded approach.
P Daddy
A: 

If you have a limited set of priority levels, you could keep a set of time-sorted lists, 1 for each level. Whenever you need the next event, check the head of each list in priority order until you find one whose start time has passed. (Keep track of the minimum start time while you check - in case no event is ready yet, you know which one to wait for)

AShelly
A: 

Sounds like a problem I was having the other day, which was answered here.
Assuming you're using C#...

Byron Ross
Its not the same problem, as the duration and earliest times are not values in a void. They are constraints that can be met based on what is already in the list.
ARKBAN
+2  A: 

I think:

  1. Sort tasks by priority
  2. Fit tasks into a time-line, taking the first available slot after their earliestTime, that has a hole big enough for the task.

Convert the time-line into a list a tasks, and waits (for the gaps).

Questions:

  1. Are gaps allowed?
  2. Can tasks be split?
  3. Given the tasks as in the question: is it better to delay b to complete c, or do d so that b can start on time?

Edit:

Os the answers to my questions are:

  1. No (ish - if there is nothing to run I guess we could have a gap)
  2. No
  3. Still not clear, but I guess the example suggests run c and delay b.

In this case the algorithm might be:

  1. Sort by priority
  2. Keep a counter for the current 'time' starting with t=0
  3. Search though the sorted list, for the highest priority item that can be started at t.
  4. Add the item to the running order, and add its duration to t.
  5. Repeat 3&4 till the list is exhausted. If there are no tasks runnable at t, and there are tasks remaining pending, stick a 1-second sleep task in the running order.

This algorithm is also O(n^2).

Douglas Leeder
That's going to be O(n^2) in the worst case, isn't it?
Federico Ramponi
Yep, O(n^2) I think. Still better than NP.
Douglas Leeder
Your algorithm is the insertion sort technique I described. (Its good to see someone else coming to a similar idea.)
ARKBAN
A: 

Incidentally, in the most general case there may be no solution (unless gaps are allowed, as Douglas pointed out). For example:

Event a = new Event { priority = 1, duration = 1.0, earliestTime = 4.0 };
Event b = new Event { priority = 2, duration = 1.0, earliestTime = 4.0 };
Event c = new Event { priority = 3, duration = 1.0, earliestTime = 4.0 };
Event d = new Event { priority = 4, duration = 1.0, earliestTime = 4.0 };
Federico Ramponi
You are correct. (In the real problem, the list's start time would be set to 4.0, and all events would be scheduled.)
ARKBAN
A: 

I'm not entirely certain I understand the intricacies of your problem, but my instinct tells me you need to define a relationship between priority and start time. The example would be:

Event a = new Event { priority = 1, duration = 4.0, earliestTime = 1.0 };
Event b = new Event { priority = 2, duration = 5.0, earliestTime = 0.0 };

So, do we go ahead and start b at time = 0, or do we wait for a tick and then start a because it's higher priority? Suppose there were more events with more priorities and longer time tradeoffs. I think you need a rule along the lines of "if the next event is X higher priority and the gap (between now and the earliest time) is less than Y seconds, wait and then start the higher priority event. otherwise start the low priority event (thus pushing back the high priority one)".

rmeador
You would go ahead with event "b", and delay "a" till after "b" is done.
ARKBAN
A: 

Here is some Python code along the lines of Douglas's answer. First we sort by priority, then we fit a timeline in a selection-sort fashion:

#!/usr/bin/env python
MIN_PRIORITY = 100

class Event(object):
    def __init__(self, name, priority, duration, earliestTime):
     self.name = name
     self.priority = priority
     self.duration = duration
     self.earliestTime = earliestTime
    def __str__(self):
     return "%-10s:  P %3d  D %3.1f  T %3.1f" % (self.name, self.priority, self.duration, self.earliestTime)

def sortEvents(_events):
    def comparePriority(event1, event2):
     if event1.priority < event2.priority: return -1
     if event1.priority > event2.priority: return 1
     return 0

    # Get a copy of the events and sort by priority
    events = [e for e in _events]
    events.sort(cmp=comparePriority)

    # Select one event at a time, checking for compatibility with elapsed time
    elapsedTime = 0.0
    sortedEvents = []
    while events:
     minGap = events[0].earliestTime - elapsedTime
     for e in events:
      currentGap = e.earliestTime - elapsedTime
      if currentGap < minGap:
       minGap = currentGap
      if currentGap <= 0.0:
       sortedEvents.append(e)
       elapsedTime += e.duration
       events.remove(e)
       break

     # If none of the events fits, add a suitable gap
     if minGap > 0:
      sortedEvents.append( Event("gap", MIN_PRIORITY, minGap, elapsedTime) )
      elapsedTime += minGap
    return sortedEvents

if __name__ == "__main__":
    #e1 = Event("event1", 1, 1.0, 4.0)
    #e2 = Event("event2", 2, 1.0, 6.0)
    #e3 = Event("event3", 3, 1.0, 8.0)
    #e4 = Event("event4", 4, 1.0, 10.0)

    e1 = Event("event1", 1, 4.0, 0.0)
    e2 = Event("event2", 2, 5.0, 6.0)
    e3 = Event("event3", 3, 3.0, 0.0)
    e4 = Event("event4", 4, 2.0, 0.0)

    events = [e1, e2, e3, e4]

    print "Before:"
    for event in events: print event
    sortedEvents = sortEvents(events)
    print "\nAfter:"
    for event in sortedEvents: print event
Federico Ramponi