views:

991

answers:

2

I'm building a data warehouse that includes delivery information for restaurants. The data is stored in SQL Server 2005 and is then put into a SQL Server Analysis Services 2005 cube.

The Deliveries information consists of the following tables:

FactDeliveres

  • BranchKey
  • DeliveryDateKey
  • ProductKey
  • InvoiceNumber (DD: degenerate dimension)
  • Quantity
  • UnitCosT
  • Linecost

Note:

  • The granularity of FactDeliveres is each line on the invoice
  • The Product dimension include supplier information

And the problem: there is no primary key for the fact table. The primary key should be something that uniquely identifies each delivery plus the ProductKey. But I have no way to uniquely identify a delivery.

In the source OLTP database there is a DeliveryID that is unique for every delivery, but that is an internal ID that meaningless to users. The InvoiceNumber is the suppliers' invoices number -- this is typed in manually and so we get duplicates.

In the cube, I created a dimension based only on the InvoiceNumber field in FactDeliveres. That does mean that when you group by InvoiceNumber, you might get 2 deliveries combined only because they (mistakenly) have the same InvoiceNumber.

I feel that I need to include the DeliveryID (to be called DeliveryKey), but I'm not sure how.

So, do I:

  1. Use that as the underlying key for the InvoiceNumber dimension?
  2. Create a DimDelivery that grows every time there is a new delivery? That could mean that some attributes come out of FactDeliveries and go into DimDelivery, like DeliveryDate,Supplier, InvoiceNumber.

After all that, I could just ask you: how do I create a Deliveries cube when I have the following information in my source database

DeliveryHeaders

  • DeliveryID (PK)
  • DeliveryDate
  • SupplierID (FK)
  • InvoiceNumber (typed in manually)

DeliveryDetails

  • DeliveryID (PK)
  • ProductID (PK)
  • Quantity
  • UnitCosT
+2  A: 

I would have Quantity, UnitCode, InvoiceNumber, DeliveryID all in the fact table. Both InvoiceNumber and DeliveryID are degenerate dimensions, because they will change with every fact (or very few facts). It is possible that you could put them in their own dimension if you have a large number of items on each order. The model below may not be 100% correct if you have multiple deliveries on an invoice, but it will be close. Check out Kimball, he might have an example of a star schema for this business scenario.

Fact table:
OrderDateID (not in your model, but probably should be, date dimension in a role)
DeliveryDateID (date dimension in a role)
SupplierID (supplier dimension surrogate key)
InvoiceID (invoice dimension surrogate key)
ProductID (product dimension surrogate key)
Quantity (fact)
UnitCost (fact)
InvoiceNumber (optional)
DeliveryID (optional)

with the usual date dimension table and the following dimensions:

Supplier Dim:
SupplierID (surrogate)
SupplierCode and data

Invoice Dim:
InvoiceID (surrogate)
InvoiceNumber (optional)
DeliveryID (optional)

Product Dim:
ProductID (surrogate)
ProductCode and Data

Always remember, your (star schema) data warehouse is not going to be structured at all like your OLTP data - it's all about the facts and what dimensions describe them.

Cade Roux
A: 

Fact table PK's are almost always surrogate keys. Each fact is part of several dimensions, so the fact has FK's to the dimensions, but no real keys of it's own.

A Delivery Fact (a Line Item) belongs to a Branch, it has a Product, it is part of a larger Delivery, it occurs on a particular Date. Sounds like 4 independent dimensions.

The Delivery dimension has it's own PK and it has a dimension attribute of invoice number. Plus, perhaps, other attributes of the delivery as a whole.

Each Delivery Line Item Fact is associated with one Delivery and the invoice number for that Delivery.

S.Lott