views:

444

answers:

4

Let's say I have a table with a bunch of dates, e.g.:

declare @tbl table {
    idx int primary key,
    startdate datetime,
    enddate datetime
}

And I want to find the largest set of rows where startdate and enddate intersect (in the real world, the start date and end date represents start and end times for events, and I need to find the maximum # of events occurring simultaneously).

In another programming language I might sort all entries by startdate, then iterate through each entry once, building a temporary set of intersections (keeping track of the largest set generated). But I'm not sure if this is the most efficient way to express this in T-SQL. help!

Oh, and it's SQL Server 2000. :(

+2  A: 

try this (it's close to what you want I think...

Select Distinct EventId 
From EventTable Et
Join  (Select Top 1 RunDate, Count(*) DateCount
       From 
          (Select Distinct StartDate RunDate
           From EventTable
               Union  
           Select Distinct EndDate RunDate
           From EventTable) A
         Join EventTable E
            On A.RunDate Between E.StartDate And E.EndDate
       Group By RunDate
       Order By Count(*) Desc) Z
   On Z.RunDate Between Et.StartDate and Et.EndDate

oh, If your dates have date and Time in them, then replace all the dates herein with actual date portion only (strip off the time)

Select Distinct EventId 
From EventTable Et
Join  (Select Top 1 RunDate, Count(*) DateCount
       From 
          (Select Distinct DateAdd(day, 0, DateDiff(day, 0, StartDate)) RunDate
           From EventTable
               Union  
           Select Distinct DateAdd(day, 0, DateDiff(day, -1, EndDate)) RunDate
           From EventTable) A
         Join EventTable E
            On A.RunDate Between DateAdd(day, 0, DateDiff(day, 0, E.StartDate))
                             and DateAdd(day, 0, DateDiff(day, -1, E.EndDate))
       Group By RunDate
       Order By Count(*) Desc) Z
   On Z.RunDate Between DateAdd(day, 0, DateDiff(day, 0, Et.StartDate))
                    and DateAdd(day, 0, DateDiff(day, -1, Et.EndDate))
Charles Bretana
So did this work ??
Charles Bretana
it did, thank you!
Jen A
A: 

Another approach:

DECLARE @idx INT,
        @startdate DATETIME,
    @enddate DATETIME, 
        @prev_enddate DATETIME,
        @counter INT,
    @counter_max INT

DECLARE db_cursor CURSOR FOR  
SELECT idx, startdate,enddate 
FROM @tbl
ORDER BY startdate,enddate

OPEN db_cursor   

FETCH NEXT FROM db_cursor INTO @idx, @startdate, @enddate
SET @prev_enddate = @enddate
SET @counter = 0
SET @counter_max = 0

WHILE @@FETCH_STATUS = 0   
BEGIN   
IF @startdate < @prev_enddate
BEGIN
 SET @counter = @counter + 1 
 IF @counter > @counter_max
 BEGIN
  SET @counter_max = @counter
 END
END
ELSE
BEGIN
 SET @counter = 1
END

SET @prev_enddate = @enddate
FETCH NEXT FROM db_cursor INTO @idx, @startdate, @enddate     
END   

CLOSE db_cursor   
DEALLOCATE db_cursor

SELECT @counter_max
Lukasz Lysik
A: 

This one is pretty short, easy to understand and works fine:

CREATE PROCEDURE FindEvents
AS
BEGIN
    DECLARE dates_cursor CURSOR FOR 
     SELECT
      startdate AS thedate, 1 AS change
     FROM
      dates
     UNION
     SELECT
         enddate AS thedate, - 1 AS change
     FROM
      dates
     ORDER BY 
      thedate ASC;

     DECLARE @max INT;
     DECLARE @thedate DATETIME;
     DECLARE @change INT;
     DECLARE @current INT;

     SET @max = 0;
     SET @current = 0;

    OPEN dates_cursor

    FETCH NEXT FROM dates_cursor INTO @thedate, @change

    WHILE @@FETCH_STATUS = 0
    BEGIN
     SET @current = @current + @change;
     IF (@current > @max)
     BEGIN
      SET @max = @current;
     END
        FETCH NEXT FROM dates_cursor INTO @thedate, @change
    END

    CLOSE dates_cursor
    DEALLOCATE dates_cursor

    SELECT @max;
END
Maras
I think that looping with a cursor is slower than having a query doing it
MaxiWheat
+2  A: 

Updated to remove the union all

declare @tbl table (
idx int identity(1,1) primary key,    
startdate datetime,    
enddate datetime);

insert into @tbl (startdate, enddate) 
select '2009-01-01', '2009-01-05'
union all select '2009-01-02', '2009-01-04'
union all select '2009-01-01', '2009-01-03'
union all select '2009-01-03', '2009-01-06'
union all select '2009-01-04', '2009-01-07'
union all select '2009-01-05', '2009-01-08'

select idx, startdate
   , (select sum(in_or_out) 
from (
   select case when startdate<=all_events.startdate then 1 else 0 end
     + case when enddate <= all_events.startdate then -1 else 0 end as in_or_out
   from @tbl 
   where startdate <= all_events.startdate
     or enddate <= all_events.startdate) as previous
) as concurent
from @tbl all_events
order by startdate

This gives the timeline of start session, with the count of concurent sessions at the moment new session starts:

idx startdate concurent
3   2009-01-01 00:00:00.000 2
1   2009-01-01 00:00:00.000 2
2   2009-01-02 00:00:00.000 3
4   2009-01-03 00:00:00.000 3
5   2009-01-04 00:00:00.000 3
6   2009-01-05 00:00:00.000 3

To get the original request (set of concurent sessions with max concurency) you need to run this query twice, once to get the max concurent sessions and once to get the start dates of the sessions that have max concurent times, then you must get those sessions.

Updated

OK, so here the one single query that retrieves the max concurent sessions. I changed the test data to remove ambibuos overlaps of end and start:

declare @tbl table (
idx int identity(1,1) primary key,    
startdate datetime,    
enddate datetime);

insert into @tbl (startdate, enddate) 
select '2009-01-01', '2009-01-04 23:59:59'
union all select '2009-01-02', '2009-01-03 23:59:59'
union all select '2009-01-01', '2009-01-02 23:59:59'
union all select '2009-01-03', '2009-01-03 23:59:59'
union all select '2009-01-04', '2009-01-04 23:59:59'
union all select '2009-01-05', '2009-01-05 23:59:59'


select max_concurent_starts.startdate as concurentdate
  , session.*
from (
  select *
  ,(
     select sum(in_or_out) 
     from (
      select case when startdate<=all_events.startdate then 1 else 0 end
       + case when enddate <= all_events.startdate then -1 else 0 end 
       as in_or_out
       from @tbl 
       where startdate <= all_events.startdate
        or enddate <= all_events.startdate) as previous
    ) as concurent
  from @tbl all_events) as max_concurent_starts
  join @tbl as session 
     on session.startdate <= max_concurent_starts.startdate 
     and session.enddate >= max_concurent_starts.startdate
  where concurent = (
  select top 1 concurent
  from (
      select (
       select sum(in_or_out) 
       from (
        select case when startdate<=all_events.startdate then 1 else 0 end
         + case when enddate <= all_events.startdate then -1 else 0 end 
         as in_or_out
         from @tbl 
         where startdate <= all_events.startdate
          or enddate <= all_events.startdate) as previous
      ) as concurent
    from @tbl all_events) as all_events_with_concurent
    order by concurent desc)
  order by concurentdate, startdate;

This gives a result like:

concurentdate   idx startdate enddate
2009-01-02 00:00:00.000 3 2009-01-01 00:00:00.000 2009-01-02 23:59:59.000
2009-01-02 00:00:00.000 1 2009-01-01 00:00:00.000 2009-01-04 23:59:59.000
2009-01-02 00:00:00.000 2 2009-01-02 00:00:00.000 2009-01-03 23:59:59.000
2009-01-03 00:00:00.000 1 2009-01-01 00:00:00.000 2009-01-04 23:59:59.000
2009-01-03 00:00:00.000 2 2009-01-02 00:00:00.000 2009-01-03 23:59:59.000
2009-01-03 00:00:00.000 4 2009-01-03 00:00:00.000 2009-01-03 23:59:59.000

which reads as follows: on 2009-01-02 00:00:00 there were 3 concurent sessions (3, 1 and 2) with they respective starts and ends. There is a tie, on 2009-01-03 00:00:00 there were also 3 concurent sessions (1, 2 and 4) with their respective starts and ends.

Performance milage may vary. The query can be written 1 million times simpler in SQL 2005 using CTEs.

Remus Rusanu
sweet: nice idea to flatten the ranges into a simple list
van
in case you wonder why mine is so fat compared with charles's is because of the handling of ties.
Remus Rusanu
brilliant use of sum()! thanks!
Jen A