So, I think I understood what to put in dimensions, what in the fact table, and how to achieve this. Now I got the problem, that I have this dimension 'product' and a dimension 'productProperties'. I had to split this, cause otherwise my natural key in 'product' wouldn`t have been unique anymore. I asked this in this question.
So my 'productProperties' dimension table should have looked like this: Color | Material | Size
1.)To achieve this, I had to create every possible permutation of the values of 'color', 'material', 'size' and so on, right?
This would have been far over 200 million rows, so I decided to split this up. I have a dimension 'Color' now, which consists actually of the columns 'color', 'colorFront', 'colorBack'.
2.) That`s fine I guess, but what about the dimension 'size', which consists only of the columns 'surrogate_key' and 'value'?
I`ve read about 'degenerate dimensions' (in the reading recommendation given in my other question), which means, to make 'single column dimensions' one column in the fact table. That seems a bit impractical to me, since I would end up with about 5-6 extra columns in my fact table.
What if I should do this?
3.) Are those degenerate dimensions part of the primary key in the fact table?
Most important question: I will have entries in my fact table with products, that don`t match every column in my dimensions or not all dimensions at all. Meaning, I may have an entry / product, which has the property 'color', but not 'colorFront' or 'colorBack'. Since I created every permutation of 'color', 'colorFront' and 'colorBack', when trying to populate my fact table I will get multiple surrogate keys, if the product only has the property 'color' which results in kind of duplicates in my fact table, right?
4.) So do I have to filter those duplicates out, when querying my fact table? Or is this wrong at all?
I could of course split dimension 'color' in three dimensions. But then I will get entries with NULL values in some columns. Same thing with entries / products that don`t use some dimensions at all.
5.) How to handle those NULL values?
Thanks in advance for any help.