tags:

views:

630

answers:

4

This is an SQL problem I can't wrap my head around in a simple query Is it possible?

The data set is (letters added for ease of understanding):

Start      End
10:01      10:12     (A)
10:03      10:06     (B)
10:05      10:25     (C)
10:14      10:42     (D)
10:32      10:36     (E)

The desired output is:

PeriodStart   New       ActiveAtEnd    MinActive   MaxActive
09:50         0         0              0           0
10:00         3 (ABC)   2 (AC)         0           3 (ABC)
10:10         1 (D)     2 (CD)         1 (C)       2 (AC or CD)
10:20         0         1 (D)          1 (C)       2 (CD)
10:30         1 (E)     1 (D)          1 (D)       2 (DE)
10:40         0         0              0           1 (D)
10:50         0         0              0           0

So, the query needed is a summary of the first table, calculating the minimum overlapping time periods (Start-End) and the maximum overlapping time periods (Start-End) from the first table within a 10 minute period.

'New' is the number of rows with a Start in the summary period. 'ActiveAtEnd' is the number of rows active at the end of the summary period.

I'm using Oracle, but I'm sure a solution can be adjusted. Stored procedures not allowed - just plain SELECT/INSERT (views are allowed). Its also OK to run one SQL command per 10 minute output (as once populated, that will be how it keeps up to date.

Thanks for any ideas, including 'not possible' ;-)

A: 

The New and ActiveAtEnd are fairly straightforward (assuming the the period's start and end being stored in temporary variables):

select  @periodStart PeriodStart
, @periodEnd PeriodEnd 
, n.[new]
, ae.ActiveAtEnd
from (
select count(*) [new] 
from @times 
where [start] >= @periodStart
and [start] < @PeriodEnd 
) n 
cross join 
(
select count(*) [ActiveAtEnd] 
from @times
where [start] < @PeriodEnd 
and [end] >= @PeriodEnd 
) ae

The Max and Min Actives are harder. You can presume a minute's granularity, so you would need to explode out active period at that granularity to be able to probe into each slice.

I'm not sure that that's possible in a single query.

Unsliced
A: 

The only way that I have ever been able to solve this sort of problem has been to get the count of 'start' for each one minute period. You then get the maximum (or minimum) for the 10 minute group. I have not been able to apply a set based approach.

Anthony K
+1  A: 

I'm struggling with the ActiveAtEnd value, but the others are OK.

This is for MySQL:

set @active:=0;

select 
  period, 
  sum( if( score=1, 1, 0)) New, 
  if( max(ab) > max(aa), max(ab), max(aa)) MaxActive, 
  if( min( ab ) < min( aa ), min(ab), min(aa)) MinActive  
from  (
        select 
          period, 
          etime, 
          score, 
          @active ab, 
          @active:=@active+score aa 
        from (
               select 
                 from_unixtime( floor( unix_timestamp(start)/600) * 600) period, 
                 start etime, 
                 +1 score 
               from ev  
               union all
               select from_unixtime( floor( unix_timestamp(end)/600) * 600) period, 
                 end etime, 
                 -1 score
               from ev 
             ) event order by etime
      ) as temp 
group by period;

The innermost selection breaks the original table into a set of events - with a score of +1 for a start-event, and -1 for an end event. union all is used so that duplicate events are allowed.

The next inner selection runs a variable across the score values - @active holds a count of the number of active intervals at each time point. Both the value of @active before and after the current count is added is selected: I do not know how portable this is.

The outermost selection accumulates the results for each period. 'New' is the sum of '+1' scores, MaxActive and MinActive must both take the value of active before (ab) and active after (aa) into account.

Here are sample results:

+---------------------+------+-----------+-----------+
| period              | New  | MaxActive | MinActive |
+---------------------+------+-----------+-----------+
| 2008-11-19 10:00:00 |    3 |         3 |         0 |
| 2008-11-19 10:10:00 |    1 |         2 |         1 |
| 2008-11-19 10:20:00 |    0 |         2 |         1 |
| 2008-11-19 10:30:00 |    1 |         2 |         1 |
| 2008-11-19 10:40:00 |    0 |         1 |         0 |
+---------------------+------+-----------+-----------+
Martin
+2  A: 

Assuming you also have (or Create) a table named @Times with one record for each ten minute start time, How about...

   Select T.Start,  
      (Select Count(*) From testTab
       Where Start Between T.Start 
          And DateAdd(minute, 10, T.Start)) New,
      (Select Count(*) From testTab
       Where Start < DateAdd(minute, 10, T.Start)
          And EndDt > DateAdd(minute, 10, T.Start)) ActiveAtEnd,
      (Select Max(Cnt) From 
          (Select Count(Distinct T.Which) Cnt
           From  (Select Distinct Start
                    From testTab
                    Where Start Between T.Start 
                           And DateAdd(minute, 10, T.Start)
                    Union Select T.Start 
                    Union Select DateAdd(minute, 10, T.Start)) Z
             Left Join testTab T 
               On Z.Start Between T.Start And T.EndDt
           Group By Z.Start) ZZ ) MaxActive,
      (Select Min(Cnt) From 
           (Select Count(Distinct T.Which) Cnt
            From  (Select Distinct Start
                   From testTab
                   Where Start Between T.Start 
                           And DateAdd(minute, 10, T.Start)
                   Union Select T.Start                                                         
                   Union Select DateAdd(minute, 10, T.Start)) Z
              Left Join testTab T 
                 On Z.Start Between T.Start And T.EndDt
           Group By Z.Start) ZZ ) MinActive  
   From @Times T

I Created this table in SQL Server as a Table variable, using

Declare @Times Table (Start datetime Primary key Not Null)
Declare @Start DateTime 
Set @Start = '1 Nov 2008 10:00'
While @Start < '1 Nov 2008 11:00' begin
    Insert @Times(Start) values(@Start)
    Set @Start = DateAdd(minute, 10, @Start) 
End

If you are using another product, use a temp table instead... but this approach does need a table with one record for each ten minute "period" as a hook to run against...

with the following data, this query generates output as follows:

start                   endDt                   Which
----------------------- ----------------------- -----
2008-11-01 10:01:00.000 2008-11-01 10:12:00.000 A
2008-11-01 10:03:00.000 2008-11-01 10:06:00.000 B
2008-11-01 10:05:00.000 2008-11-01 10:25:00.000 C
2008-11-01 10:14:00.000 2008-11-01 10:42:00.000 D
2008-11-01 10:32:00.000 2008-11-01 10:36:00.000 E
2008-11-01 10:22:00.000 2008-11-01 10:51:00.000 F
2008-11-01 10:22:00.000 2008-11-01 10:23:00.000 G

Start                   New         ActiveAtEnd MaxActive   MinActive
----------------------- ----------- ----------- ----------- -----------
2008-11-01 10:00:00.000 3           2           3           0
2008-11-01 10:10:00.000 1           2           2           2
2008-11-01 10:20:00.000 2           2           4           2
2008-11-01 10:30:00.000 1           2           3           2
2008-11-01 10:40:00.000 0           1           2           1
2008-11-01 10:50:00.000 0           0           1           0

Warning: Null value is eliminated by an aggregate or other SET operation.

Charles Bretana