views:

81

answers:

2

I have the following Oracle query:

SELECT id,
       DECODE(state, 'Open', state_in, NULL) AS open_in,
       DECODE(state, 'Not Open', state_in, NULL) AS open_out,
FROM (
       SELECT id,
              CASE WHEN state = 'Open'
                   THEN 'Open'
                   ELSE 'Not Open'
              END AS state,
              TRUNC(state_time) AS state_in
       FROM ...
     )

This gives me data like the following:

id  open_in              open_out
1   2009-03-02 00:00:00
1                        2009-03-05 00:00:00
1   2009-03-11 00:00:00
1                        2009-03-26 00:00:00
1                        2009-03-24 00:00:00
1                        2009-04-13 00:00:00

What I would like is data like this:

id  open_in              open_out
1   2009-03-02 00:00:00  2009-03-05 00:00:00
1   2009-03-11 00:00:00  2009-03-24 00:00:00

That is, keep all the unique pairs of id/open_in and pair with them the earliest open_out that follows open_in. There can be any number of unique open_in values for a given id, and any number of unique open_out values. It is possible that a unique id/open_in will not have a matching open_out value, in which case open_out should be null for that row.

I feel like some analytic function, maybe LAG or LEAD, would be useful here. Perhaps I need MIN used with a PARTITION.

+1  A: 

I think Stack Overflow must be inspirational, or at least it helps me think clearer. After struggling with this thing all day, I finally got it:

SELECT id,
       open_in,
       open_out
FROM (
       SELECT id,
              open_in,
              LAG(open_out, times_opened) OVER (PARTITION BY id
                                                ORDER BY open_out DESC
                                                NULLS LAST) AS open_out
       FROM (
              SELECT id,
                     open_in,
                     open_out,
                     COUNT(DISTINCT open_in) OVER (PARTITION BY id)
                       AS times_opened
              FROM (
                     SELECT id,
                            DECODE(state, 'Open', state_in, NULL) AS open_in,
                            DECODE(state, 'Not Open', state_in, NULL)
                              AS open_out
                     FROM (
                            SELECT id,
                                   CASE WHEN state = 'Open'
                                        THEN 'Open'
                                        ELSE 'Not Open'
                                   END AS state,
                                   TRUNC(au_time) AS state_in
                            FROM ...
                          )
                   )
            )
     )
WHERE open_in IS NOT NULL

Update: looks like this doesn't completely work. It works fine with the example in my question, but when there are multiple unique id's, the LAG stuff gets shifted and dates don't always align. :(

Sarah Vessels
+3  A: 

It can be done a little bit simpler. First let's create a sample table:

SQL> create table mytable (id,state,state_time)
  2  as
  3  select 1, 'Open', date '2009-03-02' from dual union all
  4  select 1, 'Closed', date '2009-03-05' from dual union all
  5  select 1, 'Open', date '2009-03-11' from dual union all
  6  select 1, 'Shut down', date '2009-03-26' from dual union all
  7  select 1, 'Wiped out', date '2009-03-24' from dual union all
  8  select 1, 'Demolished', date '2009-04-13' from dual
  9  /

Table created.

The data equals the output of your select statement:

SQL> SELECT id,
  2         DECODE(state, 'Open', state_in, NULL) AS open_in,
  3         DECODE(state, 'Not Open', state_in, NULL) AS open_out
  4  FROM (
  5         SELECT id,
  6                CASE WHEN state = 'Open'
  7                     THEN 'Open'
  8                     ELSE 'Not Open'
  9                END AS state,
 10                TRUNC(state_time) AS state_in
 11         FROM mytable
 12       )
 13  /

        ID OPEN_IN             OPEN_OUT
---------- ------------------- -------------------
         1 02-03-2009 00:00:00
         1                     05-03-2009 00:00:00
         1 11-03-2009 00:00:00
         1                     26-03-2009 00:00:00
         1                     24-03-2009 00:00:00
         1                     13-04-2009 00:00:00

6 rows selected.

And here is the slightly easier query:

SQL> select id
  2       , min(case when state = 'Open' then state_time end)  open_in
  3       , min(case when state != 'Open' then state_time end) open_out
  4    from ( select id
  5                , state
  6                , state_time
  7                , max(x) over (partition by id order by state_time) grp
  8             from ( select id
  9                         , state
 10                         , state_time
 11                         , case state
 12                           when 'Open' then
 13                             row_number() over (partition by id order by state_time)
 14                           end x
 15                      from mytable
 16                  )
 17         )
 18   group by id
 19       , grp
 20   order by id
 21       , open_in
 22  /

        ID OPEN_IN             OPEN_OUT
---------- ------------------- -------------------
         1 02-03-2009 00:00:00 05-03-2009 00:00:00
         1 11-03-2009 00:00:00 24-03-2009 00:00:00

2 rows selected.

Regards, Rob.

Rob van Wijk
Oh, that's much nicer! And actually, I discovered a bug in mine where if there are multiple unique `id`'s, the `LAG` stuff gets off and dates don't align correctly. :( I'm going to try yours out in my full, unrestricted query, but I think it's going to work. Thanks!
Sarah Vessels
Could you add your explanation of your query? I'm working through it trying to understand what it's doing, but it'd be nice to hear what your thinking was.
Sarah Vessels
To understand what it's doing, I recommend that you execute the three parts from the inside out. In words: the innermost query assigns different numbers to only the 'Open' records. These numbers represents the groups. The 'Not Open' records should now be assigned to one of these groups. This is done with the max analytic function. Now the six records have a grp value of 1,1,3,3,3 and 3 that are used in the final group by. Hope this helps.
Rob van Wijk