I'm building a data warehouse that includes delivery information for restaurants. The data is stored in SQL Server 2005 and is then put into a SQL Server Analysis Services 2005 cube.
The Deliveries information consists of the following tables:
FactDeliveres
- BranchKey
- DeliveryDateKey
- ProductKey
- InvoiceNumber (DD: degenerate dimension)
- Quantity
- UnitCosT
- Linecost
Note:
- The granularity of FactDeliveres is each line on the invoice
- The Product dimension include supplier information
And the problem: there is no primary key for the fact table. The primary key should be something that uniquely identifies each delivery plus the ProductKey. But I have no way to uniquely identify a delivery.
In the source OLTP database there is a DeliveryID that is unique for every delivery, but that is an internal ID that meaningless to users. The InvoiceNumber is the suppliers' invoices number -- this is typed in manually and so we get duplicates.
In the cube, I created a dimension based only on the InvoiceNumber field in FactDeliveres. That does mean that when you group by InvoiceNumber, you might get 2 deliveries combined only because they (mistakenly) have the same InvoiceNumber.
I feel that I need to include the DeliveryID (to be called DeliveryKey), but I'm not sure how.
So, do I:
- Use that as the underlying key for the InvoiceNumber dimension?
- Create a DimDelivery that grows every time there is a new delivery? That could mean that some attributes come out of FactDeliveries and go into DimDelivery, like DeliveryDate,Supplier, InvoiceNumber.
After all that, I could just ask you: how do I create a Deliveries cube when I have the following information in my source database
DeliveryHeaders
- DeliveryID (PK)
- DeliveryDate
- SupplierID (FK)
- InvoiceNumber (typed in manually)
DeliveryDetails
- DeliveryID (PK)
- ProductID (PK)
- Quantity
- UnitCosT