views:

56

answers:

3

My fact table holds a user score in a course he took. Some of the details of the course, which I have to show on the report, comes from more then one table (in the actual OLTP db).
Do I create a none normalized version of that course entry in a dimension table?
Or do I just join the fact table directly to the course table join to the other tables that describe this course (course_type,faculty who created this course etc)

+1  A: 

Maybe I do not understand your question, but a fact table in a star schema is supposed to be joined to dimension tables surrounding it. If you do not feel like making joins, simply create a view, and use the view for reporting.

If you were to post a model (schema), it would be easier to comment/help.

Damir Sudarevic
I have no problem with a single step join to a dimension. But lets say this dimension holds further foreign keys, which means I have to further join the **dimension** table to some lookup table or even more tables.
Itay Moav
Sounds like a snowflake design. Nothing wrong with that, providing that the DW designer had a solid reason why to snowflake.
Damir Sudarevic
+2  A: 

Snowflaking or bridge tables do make the joins more complicated, and not just from a coding perspective, it also makes it less simple for BI users.

In most cases, I would put these directly in existing or additional dimension tables.

For instance, you have a scores fact table, which has the user details in a dimension which may or may not hold demographics on the user (perhaps it's only a bridge). Sometimes it is better to split out demographic information. So even though the gender and age might be associated with a user entity, in the dimensional model, these might be individual dimensions or lumped into a single dimension - all depending on the usage scenarios.

Perhaps your scores are attached to a state and states have regions (snowflake). It might be far more efficient for analysis to have the region dimension linked directly instead of going through the state dimension.

I think what you will find is that the dimensional model is a very pragmatic denormalization approach. The main things which are non-negotiable are the facts - after that the choice of dimensions is very much informed by the behavior of the data, your foresight for common usage scenarios - and avoiding falling into the too few dimensions and too many dimensions problems.

Cade Roux
+1  A: 

It is a common practice to consolidate several dimensions together, sacrificing normalization in favor of performance. This is usually done when your typical query will need all dimensions together (as opposed to using different bits for different use cases).

Also remember that while you receive a reduction in join overhead, there are some drawbacks:

  • Loss of flexibility, which might hinder development as the warehouse expands
  • Full table scans take longer (in traditional row-based RDBMS such as SQL Server)
  • Disk space consumption

You will have to consider each case separately.

It might be worthwhile to also consider the option of creating a materialized view, if such ability is offered by your RDBMS.

shmichael