views:

292

answers:

3

We're doing a complex bit of data accumulation. Our customer sends us some stuff that includes two dimensions (time and a business unit). Time is mostly year-month. The business unit dimension has just a few attributes: a name, and a few categories to which BU's can belong for reporting and analysis purposes.

The stuff they send us includes some current state information (dates and codes). These seem fact-like. They also send some information that characterizes the relationship with the business unit (mostly additional codes). Again, these are unique to the business unit and time period.

Finally, they send us stuff that is clearly additive facts. It includes currency and counts that have proper units.

Should I commingle this qualitative information in a single fact table with the additive facts? Or should I separate the qualitative stuff (which can only be used with counts) from the quantitative stuff (which can be used with sum)?

+2  A: 

If the data is both directly related to the additive fact and is not something you want to be grouping/sorting/search on, then putting it in the fact table is okay.

Be aware, though, that non-additive data in the fact table will either prevent roll-ups or will become a lossy operation.

Brad Wilson
By "lossy" you mean that details will be lost when I aggregate the other values in the fact table?
S.Lott
+2  A: 

Only put things in the fact table if they are degenerate (causing a high-cardinality/uniqueness problems in your dimension where it takes the dimension to a 1-1 relationship to the fact table). Kimball recommends avoiding the temptation to put anything but degenerate dimensions in with the facts (unique order number, for instance).

You can always put these in what Kimball calls a "junk" dimension. All those codes can simply be lumped into a junk dimension. Most dates would go in the fact table as keys into your date dimension in a particular role (usually with a natural int key of the form YYYYMMDD - one of the only times we don't use a non-identity meaningless surrogate key)

I like to naively view the star as all the facts and then which columns go into which dimensions is simply determined by convenience. One should not necessarily view them as corresponding to a particular business entity - remember, the star is not an ERD-style normalized OLTP database.

Cade Roux
+1  A: 

Brad Wilson accurately describes the risk of adding them to your fact table. In the past, I've added junk attributes to my fact table only to require refactoring later.

The stuff they send us includes some current state information (dates and codes). These seem fact-like. They also send some information that characterizes the relationship with the business unit (mostly additional codes). Again, these are unique to the business unit and time period.

What business purpose do the dates serve? Offhand, I'd recommend making these their own dimensions and describe them accurately.

How volatile are the extra codes that come in? If the grain of your fact table is date and BU, why can't they be included in the BU dimension and treated as slowly changing attributes?

Without more details I can't make a firm recommendation but these would be the first questions I'd ask myself.

TrickyNixon