views:

96

answers:

3

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.

Below is a data sample:

SliceID  PersonID   StartDt  Detail1    Detail2  Detail3  Detail4 ...
      1       101  08/20/09      Red    Vanilla     N          23
      2       101  08/31/09   Orange  Chocolate     N          23
      3       101  09/15/09   Yellow  Chocolate     Y          24
      4       101  09/16/09    Green  Chocolate     N          24
      5       102  01/10/09     Blue      Lemon     N          36
      6       102  01/11/09   Indigo      Lemon     N          36
      7       102  02/02/09   Violet      Lemon     Y          36
      8       103  07/07/09      Red     Orange     N          12
      9       104  01/31/09   Orange     Orange     N          12
     10       104  10/20/09   Yellow     Orange     N          13

I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.

I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.

Thanks you for your time and consideration.

+1  A: 

I think you'll have better luck with the LAG function:

SELECT s.sliceid
  FROM (SELECT t.sliceid,
               t.personid,
               t.detail3,
               LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
          FROM TABLE t) s
 WHERE s.personid = 101
   AND (s.prev_val IS NULL OR s.prev_val != s.detail3)

Subquery Factoring alternative:

WITH slices AS (
  SELECT t.sliceid,
         t.personid,
         t.detail3,
         LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
    FROM TABLE t)
SELECT s.sliceid
  FROM slices s
 WHERE s.personid = 101
   AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
OMG Ponies
+1  A: 

In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:

  SELECT s.sliceid
       , s.personid
    FROM (SELECT t.sliceid,
                 t.personid,
                 t.detail3,
                 LAG(t.detail3) OVER (
                   PARTITION BY t.personid ORDER BY t.startdt
                 ) prev_val
            FROM t) s
   WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)
egorius
+2  A: 

Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)

SQL> select * from
  2  (
  3      select sliceid
  4             , personid
  5             , startdt
  6             , detail3 as new_detail3
  7             ,  lag(detail3) over (partition by personid 
  8                                    order by startdt) prev_detail3
  9      from some_table
 10  )
 11  where prev_detail3 is null
 12  or ( prev_detail3 != new_detail3 )
 13  /

   SLICEID   PERSONID STARTDT   N P
---------- ---------- --------- - -
         1        101 20-AUG-09 N
         3        101 15-SEP-09 Y N
         4        101 16-SEP-09 N Y
         5        102 10-JAN-09 N
         7        102 02-FEB-09 Y N
         8        103 07-JUL-09 N
         9        104 31-JAN-09 N

7 rows selected.

SQL>

The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:

SQL> with subq as (
  2      select t.*
  3             , row_number () over (partition by personid
  4                                   order by sliceid ) rn
  5      from
  6          (
  7              select sliceid
  8                     , personid
  9                     , startdt
 10                     , detail3 as new_detail3
 11                     ,  lag(detail3) over (partition by personid
 12                                           order by startdt) prev_detail3
 13              from some_table
 14          ) t
 15      where t.prev_detail3 is null
 16      or ( t.prev_detail3 != t.new_detail3 )
 17       )
 18  select sliceid
 19         , personid
 20         , startdt
 21         , new_detail3
 22         , prev_detail3
 23  from subq sq
 24  where exists ( select null from subq x
 25                 where x.personid = sq.personid
 26                 and   x.rn > 1 )
 27  order by sliceid
 28  /

   SLICEID   PERSONID STARTDT   N P
---------- ---------- --------- - -
         1        101 20-AUG-09 N
         3        101 15-SEP-09 Y N
         4        101 16-SEP-09 N Y
         5        102 10-JAN-09 N
         7        102 02-FEB-09 Y N

SQL>

edit

As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.

APC
Slowly getting closer and closer :)Although daddy6Elbows says he wants SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
egorius
Thanks. Everyone had good answers, but I've got to give the nod to the most complete one--including examples and extra commentary. But I gave everybody a point because they were all technically correct.
daddy6Elbows