views:

107

answers:

4

I am in the process of designing a new customer fact and dimensions for my warehouse. In my search for good example models I noticed something odd. No one seems to have a customer-centric fact. Every example I found has a transaction event such as a sale or order as the central fact with customers as a dimension. This raises a question for me.

Am I doing something seriously wrong by wanting a customer fact? The goal is to enable analysis of customer behavior such as order frequency, total spend, acquisition cost, distinct, product count,...etc These questions naturally imply a fact to me not a dimension. I already have an order fact that is great for order-centric queries but is not good for customer-centric queries.

To give you a little more detail the Customer Fact will likely have the following measures and dimensions: Measures: Count of Customers Distinct Product Count Completed Order Count Total Revenue Total Cost Count of Coupons Received Count of Coupons Redeemed Cost of Coupons Redeemed

Dimensions: Order Delivery Date Order Delivery Time Order Delivery Geography Acquisition Source Order Type Coupon Type

The above seems pretty natural to me but I am concerned I am missing an obvious no-no by taking a customer-centric approach in this new cube.

A: 

I may have misunderstood your question, but let's see what can be learned about customer behaviour from the factOrder only, the old-fashioned way.

Assuming that grain of the factOrder is one line on the order and that there is OrderID as a degenerate dimension.

-- Number of customers who ordered something at least once
select
    count(distinct CustomerKey) as PayingCustomers
from factOrder ;

.

-- Number of orders and sales per customer
select 
      CustomerKey
    , count(distinct OrderID) as NumberOfOrders
    , sum(ExtendedPrice)      as Total
from factOrder
group by CustomerKey ;

.

-- Histogram (x = NumberOfOrders, y = People, Amount)
with
orders_per_customer as (
    select 
      CustomerKey
    , count(distinct OrderID) as cnt
    , sum(ExtendedPrice)      as Total
    from factOrder
    group by CustomerKey
)
select
      cnt        as NumberOfOrders
    , count(1)   as People
    , sum(Total) as Amount
from orders_per_customer
group by cnt
order by cnt asc ;

.

-- Distinct products ordered by customer
select 
      CustomerKey
    , count(distinct ProductKey) as DistinctProductsOrdered
from factOrder
group by CustomerKey ;

.

-- Histogram (x = NumberOfDistinctProducts, y = People)
with
products_per_customer as (
    select 
      CustomerKey
    , count(distinct ProductKey)  as cnt
    from factOrder
    group by CustomerKey
)
select
      cnt       as NumberOfDistinctProducts
    , count(1)  as People
from products_per_customer
group by cnt
order by cnt asc ;
Damir Sudarevic
My business users interact with the cube not the warehouse. The cube interaction is mediated by either Excel or Cognos. While I can perform such queries quite easily in either SQL or MDX my business users cannot. This is another reason I seem to be driving toward a customer-cube as an addition to my order cube.
Kevin D. White
+2  A: 

We have customer facts. A lot of the time they are factless fact tables which just link several dimensions.

Sounds like a lot of your facts are derived or summary. Grain will still be important. If you say order count is that MTD (and what date) or for all time, etc.

I don't think there is anything wrong with that, but I think because this is derived data, most people would put it in a "data mart" or whatever the best unambiguous term for a subset for analysis.

I agree that modeling it the same way is perfectly valid. The only thing you have to watch out for is the same with all derived data, it needs to remain consistent.

Your customer will have a dimension (conformed, since it is shared between models) and then a CustomerStats fact table or whatever, with every fact at that grain which shares all those dimensions.

Cade Roux
You make some good really good points here. Especially about factless fact tables. To be clear however that is not what I am proposing to build.
Kevin D. White
+1  A: 

The reason so many systems are order-centric instead of customer-centric is that how you identify the customer changes so frequently over time: previously treating a business as the customer evolves into treating individual employees as the customers or vice versa, or a customer will change/split/merge addresses, or a business changes its name and we want to consolidate (or segregate) old and new performance totals, or a shipping address and billing address now have to be expanded to include a support address, or operators forget or mistake one address purpose for another, or a customer wants to use a special shipping address only temporarily, or etc etc.

This is addressed in more detail here.

joe snyder
Luckily we don't have to deal with such changes in the definition of a customer in my office. While we do have both a corporate and a retail side of the business every individual is considered a customer for the business. While we do analyze data at corporate enterprise level this analysis is strictly order-based and can be done quite readily with my existing cube.The customer cube is something new to allow me and the analysts to focus on customers as a separate entity and ask questions about customer behavior and demographics. At least that is my hope.
Kevin D. White
A: 

Measures such as order frequency, total spend, acquisition cost, distinct product count are actually derived from Orders as a fact table, with Customers as a dimension. Aggregating per customer could just as easily been aggregating per product or per geographic location.

As Cade Roux has suggested, you can build a customers aggregated table, which should be detached from the other fact tables, however that would purely be a performance decision. You retain maximum flexibility by constructing Customers as a dimension of Orders.

shmichael
I do already have a order fact with multiple dimensions in use. One of the dimensions is customers. However the problem is many of the questions my business users are trying to answer around behavior are extremely difficult to answer with the existing order fact and dimensions. I could change the design but that would have additional downstream impact on my users. Part of the driver for this new customer fact is so that I can leave my existing fact in place and minimize the impact to my users.
Kevin D. White
Is it extremely difficult due to performance issues or due to schema limitations?If it is schema limitations - could you provide an example or two?
shmichael