views:

208

answers:

2

I am a newcomer to data warehouses and have what I hope is an easy question about building a star schema:

If I have a fact table where a fact record naturally has a one-to-many relationship with a single dimension, how can a star schema be modeled to support this? For example:

  • Fact Table: Point of Sale entry (the measurement is DollarAmount)
  • Dimension Table: Promotions (these are sales promotions in effect when a sale was made)

The situation is that I want a single Point Of Sale entry to be associated with multiple different Promotions. These Promotions cannot be their own dimensions as there are many many many promotions.

How do I do this?

A: 

Time is almost always a dimension in a star schema.

"In effect" suggests that there is a start and end date for a Promotion.

So a Promotion might itself be a fact that has a start and end date reference to the Time dimension.

Maybe with a model like this you could have a JOIN table to relate Sale to Promotion in a many-to-many fashion between facts.

"Many, many" Promotions - yes, but how large is that? One per day means 365 records per year. I'll assume that Promotions are associated somehow with Products or Categories. A Sale would have a timestamp and multiple Products.

You have to store them somewhere, sometime or your model falls apart. Why the reluctance to model Promotion that way?

My advice would be to not worry about the size of the data and concentrate on modeling the problem as best you can. Get the logical model right first, then worry about the physical model and the data sizes.

duffymo
This i a fictitious example, but the reason I couldn't use time as a join would be that a particular product is under a promotion during a time period when not all products are under the same promotion. Think of coupons, these are promotions which can apply to specific products, but they can also apply to more than one product at a time. I know I have to store promotions somewhere, but having a dimension for every promotion would not work. I know enough at this point to know that I don't want a fact table with 3000 columns, most of them pointing to a "Not under this promotion" record.
Mike Gates
Everything I read about star schema says that time is the FIRST dimension you include. And I don't think it would be 3000 columns point to promotion, which would obviously break first normal form for any relational model. It would be a single key to the promotion at hand for that sale. How many apply to a single sale?
duffymo
I hoped it wouldn't get to this. This was a fictitious example so of course logical holes would develop. How about...each POS entry can be a member of 1 or more named groups. There can be any number of these groups and are assigned by a grouping table out in the normal relational world. I want to be able to query the cube (built from the star schema) to find out what POS entries fall into groups: "Group 1", "Group 2" and "Group 3". And again, there can be any number of groups out there as they are custom created by managers or something like that.
Mike Gates
Or, to continue the previous example, say that 0, 1, or more promotions can apply to a single sale.
Mike Gates
A: 

You should load a fact record for each promotion, even if the dollar amount is the same. If in fact, each type of promotion in your example is truly represented by this specific dollar amount, then a fact record should be loaded with the key of the promotion type, also containing keys back to other related dimensions (including Date).

The main point here is don't worry about data duplication. Think about a sales-oriented Data Warehouse, for say, a fast food company. One can assume there won't be just one fact record for $4.13, which is used to represent a million distinct sales of "value meal #3". Instead, each record in the "Transaction" dimension would have a relationship with at least one specific fact record in this hypothetical Sales fact table.

Jamey