ansaurus

Question

Answer 1

A:

Hi,

You can use the Sort Transform in SSIS to sort your data set by more than one column. Simply sort by your primary key (or ID field) followed by your timestamp column in descending order.

See the following article for more details on working with the sort Transformation?

http://msdn.microsoft.com/en-us/library/ms140182.aspx

Make sense?

Cheers, John

John Sansom 2009-03-06 14:38:17

Answer 2

A:

This will remove rows with match on Col1, Col2 etc and have an UpdateDate that is NOT the most recent:

DELETE D
FROM   MyTable AS D
       JOIN MyTable AS T
           ON T.Col1 = D.Col1
          AND T.Col2 = D.Col2
          ...
          AND T.UpdateDate > D.UpdateDate

If Col1 and Col2 need to be considered "matching" if they are both NULL then you would need to use:

       ON (T.Col1 = D.Col1 OR (T.Col1 IS NULL AND D.Col1 IS NULL))
      AND (T.Col2 = D.Col2 OR (T.Col2 IS NULL AND D.Col2 IS NULL))
      ...

Edit: If you need to make a Case Sensitive test on a Case INsensitive database then on VARCHAR and TEXT columns use:

       ON (T.Col1 = D.Col1  COLLATE Latin1_General_BIN 
           OR (T.Col1 IS NULL AND D.Col1 IS NULL))
       ...

Kristen 2009-03-06 15:52:59

Answer 3

A:

Does it make sense to just ignore the duplicates when moving from staging to final table?

You have to do this anyway, so why not issue one query against the staging table rather than two?

INSERT final
    (key, col1, col2)
SELECT
    key, col1, col2
FROM
    staging s
    JOIN
    (SELECT key, MAX(datetimestamp) maxdt FROM staging ms ON s.key = ms.key AND s.datetimestamp = ms.maxdt

gbn 2009-03-06 19:47:08

ansaurus

tags:

views:

answers:

Remove duplicate from a staging file

related questions