views:

624

answers:

2

Hi all,
I'm facing a problem which I've spent a great deal of time trying to address, and although I have a solution, it's clunky and involves pl/sql processing, and I was wondering what others might come up with.

I'm working with a dataset that creates a new row every time a record is changed, thus maintaining a history. The most up-to-date version is then displayed in our application. Consider a table with the following data:

Person ID  Last_Name  Address_line1       Effective_Start_Date  Effective_End_Date
4913       Jones      1 First Street      03-aug-02             31-dec-12
4913       Cross      1 First Street      01-feb-02             02-aug-02
4913       Cross      86 Green Avenue     01-mar-01             31-jan-02
4913       Cross      87 Devonshire Road  01-jan-90             28-feb-02

As part of a report, I need to extract the details which have changed between a given set of dates. For instance, say I want to extract the current address_line1 and the previous address_line1 along with the date of change (effective_start_date when the new address was added). The caveat is that if other column data changes, this will create a new row too. For instance, in the example above, the last_name changed after the address changed.

Unfortunatey, the query must be generic so that it can be run as part of a report,i.e. not having to specify explicitly the effective start and end dates.

Hope that all makes sense. Hopefully, you're all still with me. So, given the data-set above, I would expect to see the following results in my report:

Person ID  Surname  Address_line1   Prev_Address_line1  Effective Start date of New Address Line 1
4913       Jones    1 First Street  86 Green Avenue     01-feb-02

My approach involves processing with pl/sql and looping over a considerable number of records but I was wondering if this can be done in a single sql query.

Does anyone have any ideas on whether this can be done using only sql?

+3  A: 
SELECT  personID, surname, address_line1,
        LAG(address_line1) OVER (PARTITION BY personID ORDER BY effectiveDate) AS prev_address_line1
FROM    mytable
WHERE   personID = :myid
ORDER BY
        effectiveDate

To return the last effective value, use:

SELECT  *
FROM    (
        SELECT  personID, surname, address_line1,
                LAG(address_line1) OVER (PARTITION BY personID ORDER BY effectiveDate) AS prev_address_line1,
                ROW_NUMBER() OVER (PARTITION BY personID ORDER BY effectiveDate DESC) AS rn,
        FROM    mytable
        WHERE   personID = :myid
        )
WHERE   rn = 1
Quassnoi
A: 

Don't Completely Understand your Question.

As part of a report, I need to extract the details which have changed between a given set of dates.

Versus

Unfortunatey, the query must be generic so that it can be run as part of a report,i.e. not having to specify explicitly the effective start and end dates.

1) Does your Report provide a start date and End date, and look for changes within the range?. 2) Or are you looking for a general report for all changes with no constraints on date?

Also

Assuming the Last Name Changed within the set of dates for which you are running the report.

1) In your expected result are you completely ignoring the Fact that the Last_Name Changed

2) Or will that be another row in your Report.

of the type

Person ID  Surname  Prev_Surname   Effective_Start_date_of_New Surname
4913       Jones    Cross          03-aug-02

3) Or is it even more complicated that you will capture all changes in one row.

Person ID  Surname  Prev_Surname   Effective_Start_date_of_New Surname Address_line1   Prev_Address_line1  Effective_Start_date_of_New_Address_Line_1 
4913       Jones    Cross          03-aug-02                           1 First Street  86 Green Avenue     01-feb-02

For a Report which shows each change, with the previous values for name and address. Try this. You can pick the last change by an outer select and choosing Change_rank =1

SELECT   PERSON_ID, CHANGED, NEW_VAL, OLD_VAL, EFFECTIVE_START_DATE
        -- The Rank just lets you pick by order of change incase you only want the last change, or last 2 changes
         ,RANK () OVER (ORDER BY EFFECTIVE_START_DATE DESC) CHANGE_RANK
    FROM (
          -- A select over Partition for each column where Old_val and New_Val need to be retrieved.
          -- In this Select the Address Column and Changed Address Are retrieved.
          SELECT PERSON_ID, 'ADDRESS_LINE1 CHANGE' CHANGED, ADDRESS_LINE1 NEW_VAL
                ,LEAD (ADDRESS_LINE1) OVER (PARTITION BY PERSON_ID ORDER BY EFFECTIVE_START_DATE DESC) AS OLD_VAL
                ,EFFECTIVE_START_DATE
            FROM TMP_RM_TEST
           -- Usually You want to run a report for a period,  Or remove this entire filter
          WHERE  EFFECTIVE_START_DATE BETWEEN TO_DATE (:REPORT_START_DATE, 'YYYYMMDD')
                                          AND TO_DATE (:REPORT_END_DATE, 'YYYYMMDD')
          UNION
          -- In this Select the Name Column and Changed Name Are retrieved.
          SELECT PERSON_ID, 'LAST_NAME CHANGE' CHANGED, LAST_NAME NEW_VAL
                ,LEAD (LAST_NAME) OVER (PARTITION BY PERSON_ID ORDER BY EFFECTIVE_START_DATE DESC) AS OLD_VAL
                ,EFFECTIVE_START_DATE
            FROM TMP_RM_TEST
           WHERE EFFECTIVE_START_DATE BETWEEN TO_DATE (:REPORT_START_DATE, 'YYYYMMDD')
                                          AND TO_DATE (:REPORT_END_DATE, 'YYYYMMDD') )
   -- Since the Partition By lists all changes over Person_ID, Irrespective of whether the Column changed, Filter to Remove Lines where no change was made to the Coloumn monitored
WHERE    NEW_VAL <> OLD_VAL
ORDER BY CHANGE_RANK
Reuben