views:

686

answers:

5

I have an Event table that specifies a date range with start_date and end_date fields. I have another date range, specified in code, that defines the current week as 'week_start' and 'week_end'.

I'd like to query all Events for the week. The cases seem to be:

  • Event begins and ends within the week
  • Event begins before the week, but ends within the week
  • Event begins within the week, but ends after the week
  • Event begins before the week and also ends after the week
  • Events that neither reside within, nor overlap the week at all are ignored

I'm attempting to come up with a query that can handle all these cases. So far I've only been able to get cases that handle the week overlaps, or events that are fully internal; Essentially, too many records, or none at all.

+14  A: 
(event.start BETWEEN week.start AND week.end)
OR
(week.start BETWEEN event.start AND event.end)

In simple words, either a week starts during the event, or an event starts during the week.

Let's check it:

Event begins and ends within the week

The event starts during the week.

Event begins before the week, but ends within the week

The week starts during the event.

Event begins within the week, but ends after the week

The event starts during the week.

Event begins before the week and also ends after the week

The week starts during the event.

Note that BETWEEN in expressions above is used just for the sake of brevity.

Strict expression looks like this:

(event.start >= week.start AND event.start < week.end)
OR
(week.start >= event.start AND week.start < event.end)

, provided that week.end is a week.start + INTERVAL 7 DAY.

I. e. if you week starts of Sun, 0:00:00, then it should end on next Sun, 0:00:00 (not on Sat, 0:00:00)

This expression looks more complex than the one which is commonly used:

event.start < week.end AND event.end > week.start

, but the former is more efficient and index friendly.

See these articles in my blog for performance comparisons:

Quassnoi
+1 for use of between
Tom
Between here is just for brevity. As a rule, week end should be a strict less.
Quassnoi
Too bad this won't solve the problem of the events that are begin before the week and end after the week.
Pop Catalin
@Pop Catalin: Really? Doesn't the week in question start during the event?
Quassnoi
You're right I've misread your example, +1 then.
Pop Catalin
your answer is correct, but I'm using Django and, as such, can only query left-to-right, so to speak. So now I have to translate everything after the OR into something that suites the "field__operator=value" notation that Django uses.
Soviut
-1 This is complicated and inefficient. The best solution is in fact event.start < week.end AND event.end > week.start as proposed by Pop Catalin.
John Machin
@john: how do you think, which condition is more selective and hence more efficient: START_DATE< WEEK_END or START_DATE BETWEEN WEEK_START AND WEEK_END ?
Quassnoi
Which is more selective depends on what indexes are available. In any case it's a loaded question; your first BETWEEN is of the form "column1 BETWEEN param1 AND param2" which is nice and fast if there's an index on column1, but the second is of the form "param1 BETWEEN column1 AND column2" which a query optimiser has to turn into "column1 >= param1 AND column2 <= param1" -- not so nice, not so fast; THEN it still has to do the OR part. ALL that the simple solution has to do is of the form "column1 >= parm1 and column2 <= parm2".
John Machin
@John: do I get you right: you think that more selective conditions lead to less efficient query in a database? I. e. COLUMN1 >= PARAM1 AND COLUMN2 <= PARAM1 is less efficient than COLUMN2 <= PARAM2 ?
Quassnoi
@Quassnoi: No I don't think that. You are still not comparing like with like. Your query simplifies down to "event.start BETWEEN p1 and p2 OR (event.start <= p1 AND event.end >= p1)". Everybody else's query is simply "event.start <= p2 and event.end >= p1". Yours can't be faster. In a likely scenario (years of data, queries skewed towards more recent weeks), you would probably want to reformulate to "event.end BETWEEN p1 and p2 OR p2 BETWEEN event.start AND event.end" with index on event.end; as proposed, with index on event.start, "event.start <= p1" is slow.
John Machin
@John: see the article behind the link in the post for performance comparison
Quassnoi
@Quassnoi: (1) Your test data limits evt_start to be <= 2009-06-30 but query week is first week in July! Result: only three of the 6 possible combinations are tested. Please change to more realistic e.g. first week in JUNE (2) Using sqlite3, simple query runs in 0.21 secs (used the (evt_end, evt_start) index); complex qry did full table scan !? in 107 secs; using union etc used appropriate index for each part and ran in 0.53 seconds. (3) On your website the "ask me a question" link throws err msg -- pls email sjmachin at lexicon dot net; no rush, midnight here in AU :-)
John Machin
@John: 1) The events in the sample last from 1 to 35 days, 18 days in average, so July week overlaps plenty of them. But since SQL Server and Oracle use full scans, this doesn't matter anyway. MySQL uses index scan, so it matters, and for MySQL I did a test with an earlier date (which is even in May): http://explainextended.com/2009/07/01/overlapping-ranges-mysql/
Quassnoi
@John: 2) I don't use SQLite, so I'll take your word on that.
Quassnoi
@John: 3) I fixed the link and got your message, thanks for dropping a line :)
Quassnoi
+3  A: 

You could write your condition like this:

start_date <= week_end AND end_date >= week_start

Edit: this assumes start_date <= end_date and week_start <= week_end ( are properly ordered) and gives you the best performance on most db implementations due to not using OR (which on some databases may create performance issues)

Edit2: this solution also solves the problem of events that begin before the interval and end after the interval.

Pop Catalin
Much nicer solution than between for interval comparison
Alex B
A: 

In order...

where start_date >= week_start and end_date <= week_end
where start_date <= week_start and end_date >= week_start and end_date <= week_end
where start_date >= week_start and start_date <= week_end and end_date > week_end
where start_date < week_start and end_date > week_end
Adam Robinson
A: 

(end2 >= start1) && (start2 <= end1) I think would return true for any intersecting date ranges.

I found a discussion about this here that I found useful.

Brandon
+2  A: 

+1 for pop Catalin, but alas I have no voting privilege.

The restrict condition you want is just the standard way to express Allen's "OVERLAPS" operator.

Additional SQL caveat : if end_date is nullable, be sure to treat nulls in those columns as "the end of time".

Additional functional caveat : be sure to adapt the usage of '<=' versus '<' to whether or not the recorded time periods include the end date or not.