views:

193

answers:

6

I'm facing an interesting problem:

  • I have several date ranges that can overlap
  • each of them has a name

Is it possible to "des-overlap" theses ranges? That is, to generate:

  • a new set of ranges where none overlaps the others
  • each of this new range has a list of corresponding names

Maybe I can make this a bit more graphical. This is what I have first:

a   |------------------------------|
b                    |-------------------|
c          |-----------------|

This is what I would like to obtain:

    |------|---------|-------|-----|-----|
        a      a,c     a,b,c   a,b    b

I found a solution that kind of works, but which is not elegant:

  1. I transform each range (from, to) into a list of days (d1, d2, d3, etc.)
  2. I group names by day
  3. I aggregate groups that contain the same names to recreate ranges

Can you think of a better solution? I'm working with C# but any language independent idea would be much appreciated. Thanks!

+5  A: 

I would

  1. Keep an unordered list of "open" ranges
  2. Start from day 1, and add the first range to the open ranges list.
  3. Move until the next range boundary (be it start or close). Create your first "output" range: starting from day 1, ending on that day. In it are the items in your open ranges list.
  4. If the range encountered is already in the open ranges list, remove it. Otherwise, add it.
  5. Repeat 3 and 4, moving along the line.

I definitely did a bad job of explaining it. I'll be writing some code for this soon. But until then, here's what would happen in your solution:

a   |------------------------------|
b                    |-------------------|
c          |-----------------|
1.  Start at day 1, add A to open ranges list, which now contains [A]
2.  Move to the start of C.  First RESULT RANGE: A range from Day 1 to C's start,
      with a value A. (what is in your open ranges list)
3.  Add C to the open ranges list, which now contains [A,C]
4.  Move to the start of B.  Second RESULT RANGE: A range from C's start to B's
      start, with a value A,C. (what is in your open ranges list)
5.  Add B to the open ranges list, which now contains [A,C,B]
6.  Move to the end of C.  Third RESULT RANGE: A range from B's start to C's end,
      with a value of A,C,B.
7.  Remove C from the open ranges list, which now contains [A,B]
8.  Move to the end of A.  Fourth RESULT RANGE: A range from C's end to A's end,
      with a value of A,B
9.  Remove A from the open ranges list, which now contains [B]
10. Move to the end of A.  Fourth RESULT RANGE: A range from A's end to B's end,
      with a value of B

RESULT RANGES
1. from Day 1 to C's start: A
2. from C's start to B's start: A,C
3. from B's start to C's end: A,C,B
4. from C's end to A's end: A,B
5. from A's end to B's end: B

Alternative Method

You could do this:

  1. Keep a ordered list of "output ranges". This makes it easy to test if a point is within a range, and also what ranges follow eachother.
  2. Take a range from your input ranges.
  3. Check to see if it is completely before or completely after all output ranges, and handle accordingly. Skip the next steps and go back to step 2, if so.
  4. Compare its start point to the output ranges
  5. If it is before any other output range, add a new output range from its start to the start of the first output range.
  6. If this is in between an already-existing output range, split that output range at that point. The first part would hold the same "parents", and the second part would have the same parents + the new input range.
  7. Now, compare its end point to the output ranges.
  8. If it is after any other output range, add a new output range from the end of the last output range to its end.
  9. If this is in between an already-existing output range, split that output range at that point. The second part would hold the same "parents", and the first part would have the same parents + the new input range
  10. Add the current input range as a part to all ranges in between the two "processed" ranges in steps 6 and 9, if any.
  11. Repeat 2-6 for all input ranges.

Here are the first few steps, using the sample data + another range D: ("processed" ranges indicated by **double stars**)

a   |------------------------------|
b                    |-------------------|
c          |-----------------|
d       |------------------------|

1.
   Process A
   Output ranges: |--------------A---------------|
2.
   Process B
     - Start of B is in **A**; split A in two:
                  |-------A-------|------AB------|
     - End of B is after any output range range;
         creating new output range at end
                  |-------A-------|------AB------|---B---|
    - Nothing was/is in between **A** and (nothing)
3.
   Process C
     - Start of C is in **A**; split A in two:
                  |---A----|--AC--|------AB------|---B---|
     - End of C is in **AB**; split AB in two:
                  |---A----|--AC--|--ABC--|--AB--|---B---|
     - There were/are no ranges between **A** and **AB**

4.
   Process D
     - Start of D is in **A**; split A in two:
                  |-A-|-AD-|--AC--|--ABC--|--AB--|---B---|
     - End of D is in **AB**; split AB in two:
                  |-A-|-AD-|--AC--|--ABC--|ABD|AB|---B---|
     - Ranges AC and ABC were/are in between **A** and **AB**
                  |-A-|-AD-|--ACD-|-ABCD--|ABD|AB|---B---|

Final output:     |-A-|-AD-|--ACD-|-ABCD--|ABD|AB|---B---|
Justin L.
Thanks for the answer. I've a question regarding the point 6 of your alternative method. I'm not sure I understand. Could you develop?
b. austen
I've elaborated and added a demonstration.
Justin L.
b. austen
I did, thank you. Oops, heh
Justin L.
b. austen
The first solution still handles it correctly, I believe. I made some slight albeit ad-hoc modifications to the second one.
Justin L.
+1  A: 

You may want to have look at Interval Trees.

sfussenegger
+1  A: 

You could:

  1. sort the list of all dates (combining the from and to dates)
  2. then running along this list, each new range would be from one date to the next start or end date that is different from the preceding.

For naming the new ranges, it would make sense to have the list of original range names that the current new range contains, and each time you hit an end date, remove the old range name from the list, and each tiem you hit a start date, add it's name to the list.

Frank
A: 

Do this:

Create an Event class.

class DateEvent : IComparable<DateEvent>
{
    public Date Date;
    public int DateRangeId;
    public bool IsBegin; // is this the start of a range?

    public int CompareTo(DateEvent other)
    {
        if (Date < other.Date) return -1;
        if (Date > other.Date) return +1;
        if (IsBegin && !other.IsBegin) return -1;
        if (!IsBegin && other.IsBegin) return +1;
        return 0;
    }
}

Define a List<DateEvent> events;

Add each range's start and end date into the list:

for (int i = 0; i < dateRanges.Length; ++i)
{
    DateRange r = dateRanges[i];
    events.Add(new DateEvent(r.BeginDate, i, true));
    events.Add(new DateEvent(r.EndDate, i, false));
}

Sort the events.

events.Sort();

Now set up an output class:

class OutputDateRange
{
    public Date BeginDate;
    public Date EndDate;
    public List<string> Names;
}

Finally, traverse the events, maintaining which ranges are present:

List<int> activeRanges;
List<OutputDateRange> output;
Date current = events[0].Date;
int i = 0;

while (i < events.Length;)
{
    OutputDateRange out = new OutputDateRange();
    out.BeginDate = current;

    // Find ending date for this sub-range.
    while (i < events.Length && events[i].Date == current)
    {
        out.EndDate = events[i].Date;
        if (events[i].IsBegin)
            activeRanges.Add(events[i].DateRangeId);
        else
            activeRanges.Remove(events[i].DateRangeId);
        ++i;
    }

    if (i < events.Length)
        current = events[i].Date;

    foreach (int j in activeRanges)
        out.Names.Add(dateRanges[j].Name);

    output.Add(out);
}

That should do the trick. Note that I've not made the constructors, and the code is a bit ugly, but hopefully that conveys the general idea.

Peter Alexander
Hi Peter, thanks for your anwser! I can't see why your testing on the event Date in the second while loop -- it will prevent the exiting of the first one. Could you explain me this part?
b. austen
Oops, was meant to update current after the loop. I'll fix it.
Peter Alexander
It seems there is an error somewhere: we exit from the second while loop the first time...
b. austen
+2  A: 

Hi, I have code that does this. It relies on the input-set being ordered by from and then to (ie. if it was SQL, it would be ORDER BY from_value, to_value), but after that it is quite optimal.

My implementation basically does what @Justin L.'s answer describes, so if you just want a textual description, look at his answer for the algorithm.

The code is here: LVK.DataStructures, and the files you want to look at are:

To use it:

List<Range<DateTime>> ranges = ...
var slices = ranges.Slice();

This will give you one new range for each slice, and for each such range you would have a .Data property containing references back to the contributing ranges.

ie. for your original example, you would get exactly what you want, individual ranges, with the contributing ranges a, b, c etc. in the .Data property.

The classes might use other portions of my class library, which is all there. If you decide to use it, just copy out the portions you want to use.

If you're only interested in the implementation, it can be found in the RangeExtensions.cs file, line 447 and onwards, InternalSlice method.

Lasse V. Karlsen
A: 
  1. I have a list first, i don't know its length, but i got 3 chars
  2. check for one slot, if A there? add 'A', if B there? add 'B', if c there? add 'C'
  3. go to another slot, cycle like #2
  4. end when nothing added to another new slot
  5. I got the list
  6. flatten the list
Elaine