views:

1319

answers:

3

I have a data warehouse containing typical star schemas, and a whole bunch of code which does stuff like this (obviously a lot bigger, but this is illustrative):

SELECT cdim.x
    ,SUM(fact.y) AS y
    ,dim.z
FROM fact
INNER JOIN conformed_dim AS cdim
    ON cdim.cdim_dim_id = fact.cdim_dim_id
INNER JOIN nonconformed_dim AS dim
    ON dim.ncdim_dim_id = fact.ncdim_dim_id
INNER JOIN date_dim AS ddim
    ON ddim.date_id = fact.date_id
WHERE fact.date_id = @date_id
GROUP BY cdim.x
    ,dim.z

I'm thinking of replacing it with a view (MODEL_SYSTEM_1, say), so that it becomes:

SELECT m.x
    ,SUM(m.y) AS y
    ,m.z
FROM MODEL_SYSTEM_1 AS m
WHERE m.date_id = @date_id
GROUP BY m.x
    ,m.z

But the view MODEL_SYSTEM_1 would have to contain unique column names, and I'm also concerned about performance with the optimizer if I go ahead and do this, because I'm concerned that all the items in the WHERE clause across different facts and dimensions get optimized, since the view would be across a whole star, and views cannot be parametrized (boy, wouldn't that be cool!)

So my questions are -

  1. Is this approach OK, or is it just going to be an abstraction which hurts performance and doesn't give my anything but a lot nicer syntax?

  2. What's the best way to code-gen these views, eliminating duplicate column names (even if the view later needs to be tweaked by hand), given that all the appropriate PK and FKs are in place? Should I just write some SQL to pull it out of the INFORMATION_SCHEMA or is there a good example already available.

Edit: I have tested it, and the performance seems the same, even on the bigger processes - even joining multiple stars which each use these views.

The automation is mainly because there are a number of these stars in the data warehouse, and the FK/PK has been done properly by the designers, but I don't want to have to pick through all the tables or the documentation. I wrote a script to generate the view (it also generates abbreviations for the tables), and it works well to generate the skeleton automagically from INFORMATION_SCHEMA, and then it can be tweaked before committing the creation of the view.

If anyone wants the code, I could probably publish it here.

+1  A: 

Make the view or views into into one or more summary fact tables and materialize it. These only need to be refreshed when the main fact table is refreshed. The materialized views will be faster to query and this can be a win if you have a lot of queries that can be satisfied by the summary.

You can use the data dictionary or information schema views to generate SQL to create the tables if you have a large number of these summaries or wish to change them about frequently.

However, I would guess that it's not likely that you would change these very often so auto-generating the view definitions might not be worth the trouble.

ConcernedOfTunbridgeWells
I'm not following this - if I'm flattening the entire star into effectively a table indexed a different way, what was the point of the dimensional model in the first place?
Cade Roux
Not flattening, rolling up. If you're rolling up the data, you should consider materializing the views. This will be faster.
ConcernedOfTunbridgeWells
All the different queries do different things - this would only be a base view of the star without any analytical interpretation which would be useful to eliminate repetitive copy-paste, research and allow an abstraction where the view could correct irregularities in the model as well.
Cade Roux
+1  A: 
  1. I’ve used this technique on several data warehouses I look after. I have not noticed any performance degradation when running reports based off of the views versus a table direct approach but have never performed a detailed analysis.

  2. I created the views using the designer in SQL Server management studio and did not use any automated approach. I can’t imagine the schema changing often enough that automating it would be worthwhile anyhow. You might spend as long tweaking the results as it would have taken to drag all the tables onto the view in the first place!

To remove ambiguity a good approach is to preface the column names with the name of the dimension it belongs to. This is helpful to the report writers and to anyone running ad hoc queries.

Brad_Z
+1  A: 

If you happen to use MS SQL Server, you could try an Inline UDF which is as close to a parameterized view as it gets.

Damir Sudarevic
Inline table-value functions are great for requiring that the caller provide date restrictions which is great in the DW usage scenario.
Cade Roux