views:

139

answers:

1

Hello,

Recently I been working on project, where need to populated Dim Tables from EDW Tables.

EDW Tables are of type II which does maintain historical data. When comes to load Dim Table, for which source may be multiple EDW Tables or would be single table with multi level pivoting (on attributes).

Mean: There would be 10 records - one for each attribute which need to be pivoted on domain_code to make a single row in Dim. Out of these 10 records there would be some attributes with same domain_code but with different sub_domain_code, which needs further pivoting on subdomain code.

Ex:

if i got domain code: 01,02, 03 => which are straight pivot on domain code I would also have domain code: 10 with subdomain code / version as 2006,2007,2008,2009

That means I need to split my source table with above attributes into two => one for domain code and other for domain_code + version.

so far so good.

When it comes to load Dim Table:

As per design specs for Dimensions (originally written by third party), what they want is:

for every single change in EDW (attribute), it should assemble all the related records (for that NK) mean new one with other attribute values which are current => process them to create a new dim record and insert it.

That mean if a single extract contains 100 records updated (one for each NK), it should assemble 100 + (100*9) records to insert / update dim table. How good is this approach.

Other way I tried to do is just do a lookup into dim table for that NK get the value's of recent records (attributes which not changed) and insert it and update the current one.

What would be the better approach assembling records at source side for one attribute change or looking into dim table's recent record and process it.

If this doesn't make sense, would like to elaborate it further.

Thanks

Here is the model of the tables

alt text

+1  A: 

Have a look at this example.

It should be relatively straightforward.

It pivots the base data according to your rules.

It determines the change times for the denormalized "row"

It creates a triangular join to determine the start and end of each period (what I'm calling a snapshot)

Then it joins those windows to the base data to determine what the state of the data was at that time (the pivot is actually completed at this time)

I think you may need to look at the windowing mechanism - it's returning the right data, but I don't like the way the window overlap logic looks to me - it doesn't quite small right - I'm worried about the boundary conditions.

-- SO3014289

CREATE TABLE #src (
    key1 varchar(4) NOT NULL
    ,key2 varchar(3) NOT NULL
    ,key3 varchar(3) NOT NULL
    ,AttribCode int NOT NULL
    ,AttribSubCode varchar(2)
    ,Value varchar(10) NOT NULL
    ,[Start] date NOT NULL
    ,[End] date NOT NULL
)

INSERT INTO #src VALUES
('9750', 'C04', '789', 1, NULL, 'AAA', '1/1/2000', '12/31/9999')
,('9750', 'C04', '789', 2, NULL, 'BBB', '1/1/2000', '12/31/9999')
,('9750', 'C04', '789', 3, 'V1', 'XXXX', '1/1/2000', '12/31/9999')
,('9750', 'C04', '789', 3, 'V2', 'YYYY', '1/1/2000', '1/2/2000')
,('9750', 'C04', '789', 3, 'V2', 'YYYYY', '1/2/2000', '12/31/9999')

;WITH basedata AS (
    SELECT key1 + '-' + key2 + '-' + key3 AS NK
    ,CASE WHEN AttribCode = 1 THEN Value ELSE NULL END AS COL1
    ,CASE WHEN AttribCode = 2 THEN Value ELSE NULL END AS COL2
    ,CASE WHEN AttribCode = 3 AND AttribSubCode = 'V1' THEN Value ELSE NULL END AS COL3
    ,CASE WHEN AttribCode = 3 AND AttribSubCode = 'V2' THEN Value ELSE NULL END AS COL4
    ,[Start]
    ,[End]
    FROM #src
)
,ChangeTimes AS (
    SELECT NK, [Start] AS Dt
    FROM basedata
    UNION 
    SELECT NK, [End] AS Dt
    FROM basedata
)
,Snapshots as (
    SELECT s.NK, s.Dt AS [Start], MIN(e.Dt) AS [End]
    FROM ChangeTimes AS s
    INNER JOIN ChangeTimes AS e
        ON e.NK = s.NK
        AND e.Dt > s.Dt
    GROUP BY s.NK, s.Dt
)
SELECT Snapshots.NK
    ,MAX(COL1) AS COL1
    ,MAX(COL2) AS COL2
    ,MAX(COL3) AS COL3
    ,MAX(COL4) AS COL4
    ,Snapshots.[Start]
    ,Snapshots.[End]
FROM Snapshots
INNER JOIN basedata
    ON basedata.NK = Snapshots.NK
    AND NOT (basedata.[End] <= Snapshots.[Start] OR basedata.[Start] >= Snapshots.[End])
GROUP BY Snapshots.NK
    ,Snapshots.[Start]
    ,Snapshots.[End]
Cade Roux