tags:

views:

74

answers:

4

(Apologies for the title of this question - I wasn't overly sure how to explain it)

Not sure whether this can be done in SQL. Below is a (somewhat truncated) sample of an event log table.

EVENT      ID          DATE      TIME
---------  ----------  --------  ----
ONE_THING  0006241800  20091109  1719
ONE_THING  0006944800  20091109  1720
ANOTHER    0007517110  20091109  1721
ANOTHER    0007214240  20091109  1721
ANOTHER    0006907900  20091109  1725
ANOTHER    0006501580  20091109  1727
ONE_THING  0006944800  20091109  1737
ANOTHER    0005749820  20091109  1737
ANOTHER    0006810500  20091109  1738
ANOTHER    0007481970  20091109  1738
ANOTHER    0006331740  20091109  1739
ANOTHER    0007253840  20091109  1739
ANOTHER    0006929280  20091109  1747
ANOTHER    0007297950  20091109  1749
ANOTHER    0005055560  20091109  1751
ANOTHER    0006092320  20091109  1751
ONE_THING  0001668720  20091109  1753
ONE_THING  0007218000  20091109  1754

I'm looking to find groups of instances of ANOTHER where no other event has occurred in the group and for time periods in excess of 2 minutes.

So, in the above dataset, the first group would be:

ANOTHER    0007517110  20091109  1721
ANOTHER    0007214240  20091109  1721
ANOTHER    0006907900  20091109  1725
ANOTHER    0006501580  20091109  1727

and the second would be:

ANOTHER    0005749820  20091109  1737
ANOTHER    0006810500  20091109  1738
ANOTHER    0007481970  20091109  1738
ANOTHER    0006331740  20091109  1739
ANOTHER    0007253840  20091109  1739
ANOTHER    0006929280  20091109  1747
ANOTHER    0007297950  20091109  1749
ANOTHER    0005055560  20091109  1751
ANOTHER    0006092320  20091109  1751

And ideally I'd like to get:

ANOTHER    0007517110  20091109  1721
ANOTHER    0006501580  20091109  1727

and:

ANOTHER    0005749820  20091109  1737
ANOTHER    0006092320  20091109  1751

Or even better still:

EVENT      DATE      TIME_START  TIME_END
---------  --------  ----------  --------
ANOTHER    20091109  1721        1727
ANOTHER    20091109  1737        1751

I thought about comparing rows but maybe there's a better way? I'd appreciate any tips for this. The solution just needs to work - it doesn't need to be necessarily fancy or elegant.

PS> I'm using Oracle.

A: 
SELECT  *
FROM    (
        SELECT  m.*, LEAD(event) OVER (ORDER BY date, time) AS ne, LAG(event) OVER (ORDER BY date, time) AS pe
        FROM    mytable m
        )
WHERE   event = 'ANOTHER'
        AND (ne <> event OR pe <> event)
Quassnoi
A: 

This should get you started. Note that if you have multiple events with the same date and time (as you do in your sample) this is non-deterministic. You could add ID to the ORDER BY clauses as a tiebreaker if you think that makes sense.

SELECT * FROM (
SELECT event, id, date, time,
       lag(event) over (order by date, time) previous_event,
       lead(event) over (order by date, time) next_event
)
WHERE event='ANOTHER'
  AND ( event <> previous_event OR event <> next_event )
ORDER BY date, time
Dave Costa
+1  A: 

Hi Umber, this should work:

SQL> SELECT event, MIN(dt), MAX(dt) FROM (
  2     SELECT event, dt,
  3            SUM(discontinuity) over(ORDER BY dt, event) continuous_group
  4       FROM (SELECT event, dt,
  5                     CASE
  6                        WHEN lag(event) over(ORDER BY dt, event) = event THEN
  7                         0
  8                        ELSE
  9                         1
 10                     END discontinuity
 11                FROM DATA)
 12     )
 13   WHERE event = 'ANOTHER'
 14  GROUP BY event, continuous_group;

EVENT     MIN(DT)       MAX(DT)
--------- ------------- -------------
ANOTHER   20091109 1738 20091109 1751
ANOTHER   20091109 1721 20091109 1737

Note: the events at 17:37 are synchronous and my query put the ANOTHER event into the first set arbitrarily. You can control this behaviour with the ORDER BY clause of the analytic function.

Vincent Malgrat
Cheers for this - it does work.
Umber Ferrule
+1  A: 

This is extending Vincent's answer to include the requirement that the group must be at least 2 minutes in length:

select event, tm_start, tm_stop
from (select event, min(when) tm_start, max(when) tm_stop
      from (select event,
                   when,
                   sum(discontinuity) over(order by when, event) continuous_group
              from (select event,
                           when,
                           case
                             when lag(event)
                              over(order by when, event) = event then
                              0
                             else
                              1
                           end discontinuity
                      from temp_stack ts))
     where event = 'ANOTHER'
     group by event, continuous_group)
where tm_stop - numtodsinterval(2, 'MINUTE') > tm_start;
Dan
+1 for the 2 minute interval addition. Thanks.
Umber Ferrule