views:

3559

answers:

8

Given two date ranges, what is the simplest or most efficient way to determine whether the two date ranges overlap?

As an example, suppose we have ranges denoted by DateTime variables StartDate1 to EndDate1 and StartDate2 to EndDate2.

+12  A: 

I believe that it is sufficient to say that the two ranges overlap if:

(StartDate1 <= EndDate2) and (StartDate2 <= EndDate1)
Ian Nelson
+1  A: 

I woud do

StartDate1.IsBetween(StartDate2, EndDate2) || EndDate1.IsBetween(StartDate2, EndDate2)

Where IsBetween is something like

 public static bool IsBetween(this DateTime value, DateTime left, DateTime right) {
  return (value > left && value < right) || (value < left && value > right);
 }
Bob
Patrick Huizinga
Thanks for this. Makes things easier in my head.
sshow
Why would you check four conditions when you only have to check two? Fail.
Emtucifor
Ah, my apologies, I see now that you are allowing the ranges to be in reverse order (StartDateX > EndDateX). Strange. Anyway, what if StartDate1 is less than StartDate2 and EndDate1 is greater than EndDate2? The code you gave will not detect this overlapping condition.
Emtucifor
A: 

The easiest way to do it in my opinion would be to compare if either EndDate1 is before StartDate2 or EndDate2 is before StartDate1.

That of course if you are considering intervals in the future and not the past ( StartDate always before EndDate)

AlexDrenea
+56  A: 

Let CondA Mean DateRange A Completely After DateRange B (True if StartA > EndB)

Let CondB Mean DateRange A Completely Before DateRange B (True if EndA < StartB)

Then Overlap exists if Neither A Nor B is true ( If one range is neither completely after the other, nor completely before the other, then they must overlap)

Now deMorgan's law, I think it is, says that

Not (A Or B) <=> Not A And Not B

Which means (StartA <= EndB) And (EndA >= StartB)

NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that, change the >= operators to >, and <= to <

Charles Bretana
Excellent rigorous explanation. Thanks.
Ian Nelson
Edited to fix typo... Mixed up As and Bs !
Charles Bretana
Thank you! Its much easier to indicate what NOT to accept versus what to accept.
Soviut
Thank you, that's the best explanation I've found.
Evgeny
+5  A: 

For reasoning about temporal relations (or any other interval relations, come to that), consider Allen's Interval Algebra. It describes the 13 possible relations that two intervals can have with respect to each other. You can find other references - "Allen's Interval" seem to be the operative search terms. You can also find information about these operations in Snodgrass's "Developing Time-Oriented Applications in SQL" (PDF available online at URL), and in Date, Darwen and Lorentzos "Temporal Data and the Relational Model" (see Amazon.com, etc).


Emtucifor comments:

You can only get 13 if you count things funny... I can get "15 possible relations that two intervals can have" when I go crazy with it. By sensible counting, I get only six, and if you throw out caring whether A or B comes first, I get only three (no intersect, partially intersect, one wholly within other). 15 goes like this: [before:before, start, within, end, after], [start:start, within, end, after],[within:within, end, after], [end:end, after], [after:after].

I think that you cannot count the two entries 'before:before' and 'after:after'. I could see 7 entries if you equate some relations with their inverses (see the diagram in the referenced Wikipedia URL; it has 7 entries, 6 of which have a different inverse, with equals not having a distinct inverse). And whether three is sensible depends on your requirements.

----------------------|-------A-------|----------------------
    |----B1----|
           |----B2----|
               |----B3----|
               |----------B4----------|
               |----------------B5----------------|
                      |----B6----|
----------------------|-------A-------|----------------------
                      |------B7-------|
                      |----------B8-----------|
                         |----B9----|
                         |----B10-----|
                         |--------B11--------|
                                      |----B12----|
                                         |----B13----|
----------------------|-------A-------|----------------------
Jonathan Leffler
You can only get 13 if you count things funny... I can get "15 possible relations that two intervals can have" when I go crazy with it. By sensible counting, I get only six, and if you throw out caring whether A or B comes first, I get only three (no intersect, partially intersect, one wholly within other). 15 goes like this:[before:before, start, within, end, after], [start:start, within, end, after],[within:within, end, after], [end:end, after], [after:after].
Emtucifor
@Emtucifor: I think that you cannot count the two entries 'before:before' and 'after:after'.
Jonathan Leffler
Re your update: B1 to A is before:before and B13 to A is after:after. Your nice diagram is missing start:start between B5 B6, and end:end between B11 and B12. If being on an endpoint is significant, then you have to count it, so the final tally is 15, not 13. I *don't* think the endpoint thing is significant, so I personally count it [before: before, within, after], [within: within, after], [after:after] which comes to 6. I think the whole endpoint thing is just confusion about whether the bounds are inclusive or exclusive. The exclusiveness of the endpoints don't change the core relations!
Emtucifor
That is, in my scheme the these are equivalent: (B2, B3, B4), (B6, B7, B9, B10), (B8, B11, B12). I realize that B7 implies the information that the two ranges exactly coincide. But I'm not convinced this *additional* information should be part of the base intersection relations. For example, when two intervals are the exact same length even if not coincident or even overlapping, should that be considered another "relation"? I say no, and seeing as this additional aspect is the only thing making B7 distinct from B6, then I think having endpoints-as-separate-cases makes things inconsistent.
Emtucifor
@Emtucifor: OK - I see why I misidentified 'before:before' and 'after:after' as the entries; however, I cannot picture what the 'start:start' and 'end:end' entries should look like. Since you can't edit my diagram, can you email me (see my profile) with a modified copy of the diagram showing the 'start:start' and 'end:end' relationships? I have no major issues with your groupings.
Jonathan Leffler
start:start would be starting and ending at the same time, the beginning of A. Since time in computers always has a granularity or resolution, this still denotes a (possibly very short) time range. end:end would be starting and ending at the end of A. I'll do the email for completeness.
Emtucifor
After an email exchange with Jonathan, I realized that in my model where the start time is inclusive and the end time is exclusive, a range beginning and stopping at the same time is zero length (because it ends before it starts). So [start:start] and [end:end] don't make sense in my scheme. However, the scheme Jonathan suggests requires that the endpoints are inclusive, otherwise ranges B2 and A (for example) would not intersect. This to me is problematic because times in a computer are not represented to infinite precision. See next comment ->
Emtucifor
If the end time is inclusive, then to denote non-overlapping but contiguous ranges in a computer, one has to subtract the smallest time increment allowed from the end time of the earlier segment. In SQL Server with the datetime data type, for example, the ranges A:'20100101 09:00' - '20100101 10:00' and B:'20100101 10:00' - '20100101 11:00' would overlap by 1/300th of a millisecond. To make them contiguous and non-overlapping, the first range would have to be adjusted to A:'20100101 09:00' - '20100101 09:59:59.997'. But this is completely unacceptable for multiple reasons (Next ->)
Emtucifor
The first and less important reason is that people don't think of times this way. They want to say that the range begins at 9 and ends at 10. If you're collecting data from a user then you either have to present the 09:59:59.997 garbage (this would never work) or adjust all the times when you store them (not much better). The second and more important reason is that the underlying data type shouldn't change the business meaning of data. What iff the datetime column was converted to datetime2 (http://technet.microsoft.com/en-us/library/bb677335.aspx) which has 00:00:00 through 23:59:59.9999999?
Emtucifor
Now there is a gap between the end of A and the start of B. The only sensible solution to all this madness is to just to treat end times as exclusive. Once this is done, the 13 ranges mentioned by Jonathan completely break down. There is no special difference between two non-overlapping ranges that are adjacent or separated. After removing collapsed or equivalent cases, we are left with only 6 basic relations (3 if A and B are considered interchangeably).
Emtucifor
A: 

Here is a generic method that can be usefull locally.

    // Takes a list and returns all records that have overlapping time ranges.
    public static IEnumerable<T> GetOverlappedTimes<T>(IEnumerable<T> list, Func<T, bool> filter, Func<T,DateTime> start, Func<T, DateTime> end)
    {
        // Selects all records that match filter() on left side and returns all records on right side that overlap.
        var overlap = from t1 in list
                      where filter(t1)
                      from t2 in list
                      where !object.Equals(t1, t2) // Don't match the same record on right side.
                      let in1 = start(t1)
                      let out1 = end(t1)
                      let in2 = start(t2)
                      let out2 = end(t2)
                      where in1 <= out2 && out1 >= in2
                      let totover = GetMins(in1, out1, in2, out2)
                      select t2;

        return overlap;
    }

    public static void TestOverlap()
    {
        var tl1 = new TempTimeEntry() { ID = 1, Name = "Bill", In = "1/1/08 1:00pm".ToDate(), Out = "1/1/08 4:00pm".ToDate() };
        var tl2 = new TempTimeEntry() { ID = 2, Name = "John", In = "1/1/08 5:00pm".ToDate(), Out = "1/1/08 6:00pm".ToDate() };
        var tl3 = new TempTimeEntry() { ID = 3, Name = "Lisa", In = "1/1/08 7:00pm".ToDate(), Out = "1/1/08 9:00pm".ToDate() };
        var tl4 = new TempTimeEntry() { ID = 4, Name = "Joe", In = "1/1/08 3:00pm".ToDate(), Out = "1/1/08 8:00pm".ToDate() };
        var tl5 = new TempTimeEntry() { ID = 1, Name = "Bill", In = "1/1/08 8:01pm".ToDate(), Out = "1/1/08 8:00pm".ToDate() };
        var list = new List<TempTimeEntry>() { tl1, tl2, tl3, tl4, tl5 };
        var overlap = GetOverlappedTimes(list, (TempTimeEntry t1)=>t1.ID==1, (TempTimeEntry tIn) => tIn.In, (TempTimeEntry tOut) => tOut.Out);

        Console.WriteLine("\nRecords overlap:");
        foreach (var tl in overlap)
            Console.WriteLine("Name:{0} T1In:{1} T1Out:{2}", tl.Name, tl.In, tl.Out);
        Console.WriteLine("Done");

        /*  Output:
            Records overlap:
            Name:Joe T1In:1/1/2008 3:00:00 PM T1Out:1/1/2008 8:00:00 PM
            Name:Lisa T1In:1/1/2008 7:00:00 PM T1Out:1/1/2008 9:00:00 PM
            Done
         */
    }
staceyw
A: 

I am afraid I have not found a satisfying algorithm that can solve all conditions. Here is the brute force way, it is limited only by the available list of dates kept in a table. Sadly, will also eats disk I/O like hell.


    declare
        @StartDate date,
        @EndDate date

    select
        @StartDate = '1900-01-01',
        @EndDate = '2049-12-31'

    create table Numbers
    (
        Id int identity (1, 1),
        Date date primary key
    )

    ;
    with 
    DateList as -- list of dates
    (
        select @StartDate as Date 
        union all 
        select dateadd(d, 1, Date) from DateList where datediff(d, Date, @EndDate) > 0
    )
    insert
        Numbers
    (
        Date
    )
    select 
        Date 
    from 
        DateList 
    option (maxrecursion 0)

Here is the function that make use of the above table.


alter function FnOverlaps
(
    @StartDateA datetime,
    @EndDateA datetime,
    @StartDateB datetime,
    @EndDateB datetime
)
returns bit
as 
begin
    declare 
        @RetVal bit

    ;
    with 
    ins as
    (
        select Date from Numbers where Date between @StartDateA and @EndDateA
        intersect
        select Date from Numbers where Date between @StartDateB and @EndDateB
    )
    select @RetVal = case when count(*) = 0 then 0 else 1 end from ins
    return @RetVal
end
go

-- test here
select dbo.FnOverlaps('2005-01-01', '2005-12-31', '2004-01-01', '2004-01-31')
Irawan Soetomo
BETWEEN expands to two conditions, so you're using two when you only need four (see most-highly voted answer in this question).
Emtucifor
+2  A: 

All the solutions that check a multitude of conditions based on where the ranges are in relation to one another can be greatly simplified by just ensuring that a specific range starts earlier! You ensure that the first range starts earlier (or at the same time) by swapping the ranges if necessary up front.

Then, you can detect overlap if the other range start is less than or equal to the first range end (if ranges are inclusive, containing both the start and end times) or less than (if ranges are inclusive of start and exclusive of end).

Assuming inclusive at both ends, there's only four possibilities of which one is a non-overlap:

|----------------------|        range 1
|--->                           range 2 overlap
 |--->                          range 2 overlap
                       |--->    range 2 overlap
                        |--->   range 2 no overlap

The endpoint of the range 2 doesn't enter into it. So, in pseudo-code:

def doesOverlap (r1,r2):
    if r1.s > r2.s:
        swap r1, r2
    if r2.s > r1.e:
        return false
    return true

If the ranges are inclusive at the start and exclusive at the end, you just have to replace > with >= in the second if statement:

|----------------------|        range 1
|--->                           range 2 overlap
 |--->                          range 2 overlap
                       |--->    range 2 no overlap
                        |--->   range 2 no overlap

You greatly limit the number of checks you have to make because you remove half of the problem space early by ensuring range 1 never starts after range 2.

paxdiablo
+1 for mentioning the inclusive/exclusive problem. I was going to whip up an answer myself when I had time, but no need now. The thing is you almost never allow both the start and end to be inclusive simultaneously. In my industry it is common practice to treat the start as exlusive and the end as inclusive, but either way is fine as long as you stay consistent. This is the first completely correct answer on this question so far...IMO.
Brian Gideon