views:

800

answers:

2

Background) I've gone through the process of building a fact table for our inventory data that will in theory act as a nightly snapshot of our warehouse. What is recorded is information such as quantity, weights, locations, statuses, etc. The data is very granular and in many cases not specifically related to a single entity (our source database records inventory data as having three primary keys: licenseplate aka pallet, product, and packaging type - so it has essentially 3 business keys and no surrogate key).

The goal is to be able to have a 100% accurate recreation of our warehouse management system's data, that is viewable for any one day in history. So I can look up and see how many pallets of product XYZ was in location 1234 on the 4th of August.

Question 1) Now, I have built this fact table to structurally look like a Slowly Changing Dimension, Type 2. Is this wrong? I've been reading up a little on accumulating snapshot fact tables and i'm beginning to question my design. What is the best practice in this situation?

Question 2) If my design is ok, how do I configure Analysis services so that it recognizes my DateStart and DateEnd columns in the FACT table? I have found some information on how to configure this for dimensions but it does not seem to work/apply to fact tables.

For reference - My fact table's structure (with added notes about columns):

CREATE TABLE [dbo].[FactInventory](     
[id] [int] IDENTITY(1,1) NOT NULL,  (fact table only surrogate key)
[DateStart] [datetime] NULL,    (record begin date)
[DateEnd] [datetime] NULL,       (record end date)
[CreateDate] [datetime] NULL,    (create date of the inventory record in src db)
[CreateDateId] [int] NULL,       (create date dimension key)
[CreateTimeId] [int] NULL,       (create time dimension key)
[LicensePlateId] [int] NULL,     (pallet id dimension key)
[SerialNumberId] [int] NULL,     (serial number id dimension key)
[PackagedId] [int] NULL,         (packaging type id dimension key)
[LotId] [int] NULL,          (inventory lot id dimension key)
[MaterialId] [int] NULL,         (product id dimension key)
[ProjectId] [int] NULL,          (customer project id dimension key)
[OwnerId] [int] NULL,        (customer id dimension key)
[WarehouseId] [int] NULL,     (warehouse id dimension key)
[LocationId] [int] NULL,      (location id dimension key)
[LPStatusId] [int] NULL,      (licenseplate status id dimension key)
[LPTypeId] [int] NULL,    (licenseplate type id dimension key)
[LPLookupCode] [nvarchar](128) NULL, (licenseplate non-system name)
[PackagedAmount] [money] NULL,  (inventory amount - measure)
[netWeight] [money] NULL,   (inventory netWeight - measure)
[grossWeight] [money] NULL, (inventory grossWeight - measure)
[Archived] [bit] NULL,  (inventory archived yes/no - dimension)
[SCDChangeReason] [nvarchar](128) NULL (auditing data for changes)
A: 

Before going any further, is inventory really a slowly changing fact?

Edit: Then why not just snapshot every product each day, since that's what you want.

The problem is that fact tables get large and you're throwing EVERYTHING into the fact table unnecessarily. Ideally, the fact table will contain nothing more than foreign keys to dimensions and data only pertaining to the fact at hand. But some of the columns you've outlined look like they belong in one of the dimensions tables whereas

For instance, the license plate information. Status, type, and lookup code. Likewise with netWeight/grossWeight. They should be derivable from the product dimension and PackagedAmount.

CREATE TABLE [dbo].[FactInventory](     
[id] [int] IDENTITY(1,1) NOT NULL,  (fact table only surrogate key)
[day] [int] NULL,                (day dimension key, grain of a day)
[CreateDateId] [int] NULL,       (create date dimension key)
/* I take these are needed?
 * [CreateTimeId] [int] NULL,       (create time dimension key)
 * [CreateDate] [datetime] NULL,    (create date of the inventory record in src db)
 */
[LicensePlateId] [int] NULL,     (pallet id dimension key)
/* Now THESE dimension columns...possibly slowly changing dimensions?
[LPStatusId] [int] NULL,             (licenseplate status id dimension key)
[LPTypeId] [int] NULL,               (licenseplate type id dimension key)
[LPLookupCode] [nvarchar](128) NULL, (licenseplate non-system name)
*/
[SerialNumberId] [int] NULL,     (serial number id dimension key)
[PackagedId] [int] NULL,         (packaging type id dimension key)
[LotId] [int] NULL,              (inventory lot id dimension key)
[MaterialId] [int] NULL,         (product id dimension key)
[ProjectId] [int] NULL,          (customer project id dimension key)
[OwnerId] [int] NULL,            (customer id dimension key)
[WarehouseId] [int] NULL,        (warehouse id dimension key)
[LocationId] [int] NULL,         (location id dimension key)
[PackagedAmount] [money] NULL,   (inventory amount - measure)
[netWeight] [money] NULL,        (inventory netWeight - measure)
[grossWeight] [money] NULL,      (inventory grossWeight - measure)
[Archived] [bit] NULL,           (inventory archived yes/no - dimension)
[SCDChangeReason] [nvarchar](128) NULL (auditing data for changes)
hythlodayr
I guess you could say its a fast changing fact but I would like to capture a "snapshot" of all these details at least once daily.
Sangheili
+2  A: 

Typically, in a snapshot fact table you do not have changes.

You usually have a date/time dimension which is used for the granularity of the measurements and not a DateStart/DateEnd. Similarly you do not have any SCD information. The fact snapshot is taken and the Date and Time dimensions are attached to those facts. If those facts repeat identically each month, so be it.

Dealing with determining which facts are valid at a given time is more processing than you really want your DW or your ETL to handle - that kind of design (effective dates, etc) is more effectively used in a live OLTP-type system where complete history is kept in the live system. The point of the DW is to optimize for reporting, not for space, and thus there is a direct snapshot date/time dimension which allows you to easily index and potentially partition the data without a lot of date arithmetic or comparisons.

As far as your dimensional model, be careful that you aren't succumbing to the too-many dimensions problem. Remember that dimensions do not have to correspond to entities in the real world. The choice of how dimensional attributes are grouped into dimension tables should be informed by 1) query needs, 2) data affinity and change behavior, 3) business organization. You might want to look into using one or more junk dimensions.

Cade Roux