ansaurus

Question

Answer 1

+1 A:

Auto-increment integer for a surrogate PK is OK, but (according to Mr. Kimball) a dimension table should also have a natural key too. So a hash NaturalKey column would be in order, also a Status column for "current" or "expired" may be useful to allow for SCD type 2.

Damir Sudarevic 2009-11-13 03:58:54

Cool, a hashed naturalkey is what we were thinking about to go along with our surrogate id, but we weren't sure what the performance tradeoff would be.

igkins 2009-11-13 15:45:26

To improve loading performance, you can maintain (re-create after each load) a key matching table for the dimension like (NaturalKey, PrKey) which matches natural keys with the latest primary key for the table. During loading, add a hash column to you records, than look-up the primary key from the KeyMatchingTable. If not found, means it is a new record so stage those for inserts. If found, means it already exists, so decide what to do (discard, SC1 or SC2). Then load the dimension table.

Damir Sudarevic 2009-11-13 16:34:35

ansaurus

tags:

views:

answers:

Datawarehouse duplicate dimension rows

related questions