views:

186

answers:

4

I'm new to modeling star schemas, fresh from reading the Data Warehouse Toolkit.

I have a business process that has clients and non-clients calling into conference calls with some of our employees.

My fact table, call it "Audience", will contain a measure of how long an attending person was connected to the call, and the cost of this person's connection to the call. The grain is "individual connection to the conference call".

Should I use my conformed Client dimension and create a non-client dimension (for the callers that are not yet clients) this way (omitting dimensions that are not part of this questions):

First potential model

Or would it be OK/better to have a non-conformed Attending dimension related to the conformed Client dimension in this manner:

Second potential model

Or is there a better/standard mechanism to model business processes like this one?

Edit:

What about using model 2 above, but creating a view on top of the client dimension table and the attending dimension to make it look like it is only one dimension?

Is that an acceptable alternative to Damir's answer below?

A: 

It makes little difference. The second version is possibly more correct, but does your olap system support this?

TFD
+2  A: 

I would go for the second: it models the attendees in their own, dedicated dimension, whilst allowing you to expose their client-ness (or otherwise) via an attribute in that dimension, which is probably the way you would want to drill down in real life ("show me all attendees" followed by "and now which of those are clients").

In your client dimension I would populate the client_id for all attendees, matching to an "unknown" element where the attendee is not a client.

There's a nice discussion about this here:

http://crpit.com/confpapers/CRPITV75Riazati.pdf

davek
A: 

The second one looks like "snowflake schema" to me. Look into snowflake schema, starting with the wikipedia article. You'll see several comparisons between star and snowflake.

Walter Mitty
That is exactly the reason I posted the question. But it allows for a drill down behavior that is not possible otherwise.
celopes
+2  A: 

There is no need to split clients into two tables (dimensions). Simply put all clients, active and prospects into the same dimension table. You could then introduce an IsActive attribute (column) to distinguish between paying clients and prospects. Sooner or later you will use a data mining tool to learn more about clients and what distinguishes people who are willing to pay for your service from those who are not. In order for the algorithm to work, you have to supply data for both groups of people -- those who are paying and those who are not paying. To summarize, prospects belong to the same table as paying clients.

With this, you can use your model No 1. Make sure that measures in the fact table make sense. For example if a call_id =123 had 10 people participating, then

sum(cost_of_connection)
from factAudience
where call_id = 123;

should return the total cost of the call, not something meaningless -- like 10x the real cost.

EDIT

A "paying client" and a "prospect client" are both a type of a client, therefore belong to the same dimension table -- dimClient. Somewhere in the DW there is a factSale (or similar) with FK to the dimSale. Even if you do not have a column in dimClient to differentiate between paying and prospects -- you can still get paying clients by joining factSale and dimClient.

"Who is a customer?" is a common debate when introducing a DW in an organization. In order to be able to analyze client acquisition, retention, conversion, etc., prospects have the same treatment as paying customers -- at least in the DW. Keep in mind that acquiring and creating new customers is on the top of the list for (almost) any CEO.

Damir Sudarevic
So, your suggestion is to use the ETL layer to create a new dimension merging the conformed client dimension and the list of conference call attendees. Isn't that introducing a lot of complexity since my conformed client dimension is a slowly changing dimension? Now I have to not only maintain the conformed client dimension up-to-date with the operational data; but also this new dimension...
celopes
See the edit to my question. Would that be an acceptable compromise in terms of modeling?
celopes
I'm going to select your answer as the accepted (albeit I would like to wait for your answer to my edit, I only have one hour left in the bounty and I want to make sure the best answer gets the credit). If you don't mind I would still like to debate the pros and cons of an extra dimension vs. a view on top of two dimensions.
celopes
No, not a new one -- see the edit.
Damir Sudarevic
@Damir Sudarevic: After reading your edit, I needed to go ponder for a while. You are right. I think I'm missing an operational step to qualify these callers as prospects and base the datamart on that. Then all clients and prospects should live in the same dimension. I really appreciate your help!
celopes