views:

74

answers:

3

I have a table in Oracle 10 that is defined like this:

LOCATION   HOUR              STATUS
--------------------------------------
10         12/10/09 5:00PM      1
10         12/10/09 6:00PM      1
10         12/10/09 7:00PM      2
10         12/10/09 8:00PM      1
10         12/10/09 9:00PM      3
10         12/10/09 10:00PM     3
10         12/10/09 11:00PM     3

This table continues for various locations and for a small number of status values. Each row covers one hour for one location. Data is collected from a particular location over the course of that hour, and processed in chunks. Sometimes the data is available, sometimes it isn't, and that information is encoded in the status. I am trying to find runs of a particular status, so that I could convert the above table into something like:

LOCATION   STATUS     START               END
-----------------------------------------------------------
10         1          12/10/09 5:00PM     12/10/09 7:00PM
10         2          12/10/09 7:00PM     12/10/09 8:00PM  
10         1          12/10/09 8:00PM     12/10/09 9:00PM
10         3          12/10/09 9:00PM     12/11/09 12:00AM 

Basically condensing the table into rows that define each stretch of a particular status. I have tried various tricks, like using lead/lag to figure out where starts and ends are and such, but all of them have met with failure. The only trick that works so far is going one by one through the values programatically, which is slow. Any ideas for doing it directly in Oracle? Thanks!

+1  A: 

OK, I apologize for not knowing Oracle syntax, but I hope that the below Sybase one is clear enough (I split it into 3 queries creating 2 temp tables for readbility but you can just re-unit as sub-queries. I don't know how to add/subtract 1 hour in Oracle, dateadd(hh...) does it in Sybase

SELECT * FROM T
INTO   #START_OF_PERIODS
WHERE NOT EXISTS (
    SELECT 1 FROM T_BEFORE
    WHERE T.LOCATION = T_BEFORE.LOCATION
      AND T.STATUS   = T_BEFORE.STATUS
      AND T.HOUR     = dateadd(hh, T_BEFORE.HOUR, 1)
   )

SELECT * FROM T
INTO   #END_OF_PERIODS
WHERE NOT EXISTS (
    SELECT 1 FROM T_AFTER
    WHERE T.LOCATION = T_AFTER.LOCATION
      AND T.STATUS   = T_AFTER.STATUS
      AND T.HOUR     = dateadd(hh, T_AFTER.HOUR, -1)
   )

SELECT T1.LOCATION, T1.STATUS, T1.HOUR AS 'START', MIN(T2.HOUR) AS 'END'
FROM   #START_OF_PERIODS 'T1', #END_OF_PERIODS 'T2'
WHERE  T1.LOCATION = T2.LOCATION
  AND  T1.STATUS   = T2.STATUS
  AND  T1.HOUR    <= T2.HOUR
GROUP BY T1.LOCATION, T1.STATUS, T1.HOUR 
    -- May need to add T2.LOCATION, T2.STATUS to GROUP BY???
DVK
If someone know both Sybase and Oracle syntax, please feel free to edit this post to Oraclify it... this is pretty much generic transact SQL so it ought to be easy (SELECT INTO and dateadd() are the only suspects for me)
DVK
Oracle doesn't have a DATEADD function - there's a bit of conversion that would need to be done. See Oracle functions here: http://techonthenet.com/oracle/functions/index.php
OMG Ponies
A: 

Ever thought about a stored procedure? I think that would be the most readable solution.

Basic Idea:

  1. run a select statement that gives you the rown in the right order for one building
  2. iterate over the result line by line and write a new 'run'-record every time the status changes and when reaching the end of the result set.

You need to test if it is also the fastest way. Depending on the number of records, this might not be an issue at all.

Peter Schuetze
I've done something like this in PHP code for reporting, but have not tried it as a stored procedure. My hope was that it was simple to do in straight SQL, but that I was missing something obvious :)
Arcane
I voted this down, beause this solution won't scale. It is very expensive to do the ordering up front, the db will esentially have to scan the entire table and order all rows before it can do any work processing the rows.
Roland Bouman
looking at data itself, the sorting should not be too difficult if you define an index on (location, hour)
Peter Schuetze
+2  A: 

Here's an ANSI SQL solution:

select      t1.location
,           t1.status
,           min(t1.hour)                                      AS "start" -- first of stretch of same status
,           coalesce(t2.hour, max(t1.hour) + INTERVAL '1' HOUR) AS "end"
from        t_intervals t1             -- base table, this is what we are condensing
left join   t_intervals t2             -- finding the first datetime after a stretch of t1
on          t1.location = t2.location  -- demand same location
and         t1.hour     < t2.hour      -- demand t1 before t2
and         t1.status  != t2.status    -- demand different status 
left join   t_intervals t3             -- finding rows not like t1, with hour between t1 and t2
on          t1.location = t3.location
and         t1.status  != t3.status
and         t1.hour     < t3.hour
and         t2.hour     > t3.hour
where       t3.status is null          -- demand that t3 does not exist, in other words, t2 marks a status transition
group by    t1.location                -- condense on location, status.
,           t1.status
,           t2.hour                    -- this pins the status transition
order by    t1.location
,           t1.status
,           min(t1.hour)
Roland Bouman
This appears to do the trick! Thanks!
Arcane