ansaurus

Question

Answer 1

A:

I could add another column for every metric we care about, but that could range into the hundreds or even thousands. I'd write a script just to update the schema, and that smells like bad design.

You don't have that many facts. There aren't that many units.

Facts have units. Seconds, pounds, bytes, dollars.

You need to review the "Star Schema" design. You have dimensions (probably a lot) and measurable facts (probably very few).

You have a join between facts and all of the associated dimensions. You can do sum, count on the facts, and group-by on the dimensions.

You can't have thousands of independent facts. That's almost impossible. But you can have thousands of combinations of dimensions, that's common.

Separate facts (measurable quantities that add pleasantly) from dimensions (definitional qualities) and you should have a lot of dimensions around a few facts.

Buy a copy of Kimball.

S.Lott 2010-09-15 23:48:06

I actually bought Kimball on Kindle just after posting this question.

bukzor 2010-09-16 02:25:29

@S.Lott: I agree. If I look closely, I only have maybe 40 different facts, but I still have the problem that someone can add a new type of fact without warning, requiring a update to the schema. Is there any common wisdom for a system with indeterminate facts? Make a dimension called FactName?

bukzor 2010-09-16 02:30:57

Answer 2

+4 A:

If I understand correctly, you are looking for a schema to support on-fly creation of measures in a DW. In a classical data warehouse each measure is a column, so in a Kimball star you would need to add a column for each new measure -- change the schema.

What you have is an EAV model, and analytics on EAV is not easy and not fast -- take a look at this discussion.

I would suggest you look at tools like splunk, which is suited for theis type of problems.

Damir Sudarevic 2010-09-16 11:58:13

@Damir: Thanks! At least I have a name for my problem now. Do you know of any authoritative writing on analytics for EAV?

bukzor 2010-09-16 16:41:52

@bukzor, no not really.

Damir Sudarevic 2010-09-16 18:44:48

ansaurus

tags:

views:

answers:

Data Warehousing arbitrary fields

related questions