views:

100

answers:

2

So, I think I understood what to put in dimensions, what in the fact table, and how to achieve this. Now I got the problem, that I have this dimension 'product' and a dimension 'productProperties'. I had to split this, cause otherwise my natural key in 'product' wouldn`t have been unique anymore. I asked this in this question.

So my 'productProperties' dimension table should have looked like this: Color | Material | Size

1.)To achieve this, I had to create every possible permutation of the values of 'color', 'material', 'size' and so on, right?

This would have been far over 200 million rows, so I decided to split this up. I have a dimension 'Color' now, which consists actually of the columns 'color', 'colorFront', 'colorBack'.

2.) That`s fine I guess, but what about the dimension 'size', which consists only of the columns 'surrogate_key' and 'value'?

I`ve read about 'degenerate dimensions' (in the reading recommendation given in my other question), which means, to make 'single column dimensions' one column in the fact table. That seems a bit impractical to me, since I would end up with about 5-6 extra columns in my fact table.

What if I should do this?

3.) Are those degenerate dimensions part of the primary key in the fact table?

Most important question: I will have entries in my fact table with products, that don`t match every column in my dimensions or not all dimensions at all. Meaning, I may have an entry / product, which has the property 'color', but not 'colorFront' or 'colorBack'. Since I created every permutation of 'color', 'colorFront' and 'colorBack', when trying to populate my fact table I will get multiple surrogate keys, if the product only has the property 'color' which results in kind of duplicates in my fact table, right?

4.) So do I have to filter those duplicates out, when querying my fact table? Or is this wrong at all?

I could of course split dimension 'color' in three dimensions. But then I will get entries with NULL values in some columns. Same thing with entries / products that don`t use some dimensions at all.

5.) How to handle those NULL values?

Thanks in advance for any help.

+3  A: 

Every dimension has:

  • primary key (DateKey, TimeKey, ProductKey, ...)
  • business key (FullDate, ProductFullName, ColorNaturalKey, ...)
  • row with value 'unknown' (Key = 0, BusinessKey = 'unknown', all other = 'n/a')
  • row with value 'n/a' (Key = -1, BusinessKey ='n/a', all other = 'n/a')

In the Color table, columns Color, ColorFront and ColorBack all have values of 'n/a' and 'unknown' -- so those should be included in permutations. This way there is always a row in the dimension table to point to.

You may choose to make the size a degenerate dimension by moving the SizeValue into the fact table and dropping the dimSize.

alt text

Damir Sudarevic
If the same productkey comes in more than one size then SizeKey has to be part of the fact PK, I could by a size XL and XXL of the same shirt at the same time... one for me and one for my brother. I could buy two different colors of the same product key. I almost always buy the same shirt in more than one color. So too should color be part of the PK. In fact if there's a FK in the Fact table that isn't part of the PK, it's describing an existing Dimension and should probably be there.
Stephanie Page
@Stephanie True, the composite primary key should be selected more carefully. Modified the model as per your comment.
Damir Sudarevic
@Damir And if I don`t have a ColorNaturalKey, should I split it up then, to have dimensions similar to dimMaterial?
tombom
@tombom -- you need a business (natural) key of some kind to be able to fetch the primary key during loading. So invent one, for example `'c=blue-f=blue-b=na'` will do.
Damir Sudarevic
@Damir Sorry, I don`t get it. My business key would be the full name of the color. Just like you did it with dimMaterial.
tombom
@tombom -- business (natural) key uniquely identifies an object (entity) that a dimension row describes. For example, `Color = blue; ColorFront = blue; ColorBack = blue` and `Color = blue; ColorFront = blue; ColorBack = na` are two different color-entities and would have different natural keys. You could also move front and back colors into the fact table as keys, to have ColorKey, FrontColorKey and BackColorKey; then drop columns from the dimension.
Damir Sudarevic
Thanks for your help and your patience :)
tombom
+2  A: 

1) Whoever told you:

So my 'productProperties' dimension table should have looked like this: Color | Material | Size

was either wrong or you misunderstood.

That idea is called a "Junk Dimension". And it doesn't have to contain the Cartesian product to begin with. It can be loaded like any other dimension. If a combination is needed in the fact table and not in the dimension, you add it then. Cartesian-ing it to begin with is a convenience but then you better know that when a new color is added that you have to re-Cartesian. Better to load when needed and not worry about it.

2) OK so now I've read your whole question and realize that you're reading about Dimensional modeling but it looks like you're skimming it.

A degenerate dimension is something like Purchase Order number. It's not a fact. You can't SUM it. But it's not a Dimension either since there's nothing more than needs to be said about PO123210413. It's NOT a FK to anywhere.

Stephanie Page
My whole ETL process looks like this, that I retrieve the rows that were created or modified in my source database and store them in the appropriate tables in my data-warehouse. After that I rebuild my dimensions or change or add columns accordingly. I did so because I thought of dimensions as describing elements. So the end user can e.g. query for a product and is not confused if the result is, that given product wasn`t purchased yet. Thought that he may be confused if he wasn`t able to browse for a product (cause of not having been purchased yet). Is this not good practice?
tombom
Well that's backwards... how can you add a row to a fact table if you don't have a FK for a new dimension value? If you have 1 = Red, 2 = Blue, when the first Yellow shirt is added, what would the FK in the fact table be?
Stephanie Page
I rebuild my dimensions (or rather add / change the newly created / modified rows) each time the ETL process is running. Therefore it is no problem to insert the yellow shirt in the fact table.
tombom