views:

132

answers:

1

Consider a dimensional model with fact tables like (fk_dim1value, fk_dim2value, ..., value) where the fk_X columns are foreign keys into corresponding trivial dimension tables dim1value (id, value), dim2value (id, value), etc.

These fact-and-dimension tables are collected automatically from disparate sources, so there's a lot of them ... and they are redundant: all the dimension value tables are structurally identical, (id, value), representing simple collections of textual values with no further semantics (the only difference being the different foreign keys referencing them in the various fact tables). Less trivial dimension types will probably come up later, but the set of different types of dimensions will remain small.

So I want to merge the dimension tables into one table dimvalue (fk_dim, dimvalue_id, value) where fk_dim references a table dimension (dim_id, name), and dimvalue_id is unique only within each dimension. The natural primary key is then composite: (fk_dim, dimvalue_id).

The fact table foreign-key columns now all reference the same table, dimvalue (fk_dim, dimvalue_id, value) ... but of course each column is associated with a particular dimension and thus should still be limited to referencing the values of that dimension specifically (a horizontal partition of the unified table dimvalue).

Is there a (sensible) way to do this?

I mean something like a “half-composite” foreign key, i.e. a single-column reference to a “slice” of a composite PK, with a fixed value for the other column(s). A “fully-composite” FK would be FOREIGN KEY (col1, col2) REFERENCES dimvalue (fk_dim, dimvalue_id) but here fk_dim is fixed and so the “home” side of the key is just one column, referencing the second column of the dimvalue primary key; something like FOREIGN KEY (fk_dim7value) REFERENCES dimvalue (fk_dim=7, dimvalue_id).

Is something like that possible? Or am I losing my way in this last paragraph? Should I give up and just foreign-key to the whole dimvalue table and then add check constraints to limit by dimension? Or does referential integrity require me to give up even more and just accept all the separate identical tables?

(Impact of constraints on write performance is not important; read performance is a design goal.)

A: 

You've stated these key considerations

  • Data is collected from disparate systems therefore I conclude this is a "reporting" table and not an "operations" or "transactional" type system
  • each fact table contains 1 piece of business data per row, i.e., the "value" column
  • your fact table seems to contain only one "measure" or "fact"
  • write-performance is irrelevant, only read-performance is the goal. This affirms my conclusion that this is a "reporting" table

Considering you're after fast read performance I'd go for the "big table" design. Granted the big table design is HORRIBLE for transaction systems but this isn't one. By big table I mean TABLE(DIM1VALUE, DIM2VALUE, DIM3VALUE, DIM4VALUE....DIMNVALUE, FACTVALUE)

Your dimension tables were only 1 column of business data anyway so skip the lookups. Index every column (except the fact value) and then test your queries for performance.

While you load the big table you can check the values for data quality and flag/address those outside the expected ranges.

Now if your number of dimension tables was excessive you could split the big table into groups where the grouping is based on logical usage, e.g., if 10 of the attributes from the dimensions are always used together then keep them together in BIGTABLE1, and so on.

esabine
Thanks; you're right, it's a “reporting” schema, but I do not like the “big table” approach. It would blow up the width of the fact tables (dimension values can be wide), making queries I/O heavy. It would not save me that much since I can often avoid the join to the dimension tables anyway. And it would tie me to this only-trivial-dimension-values model, which is not permanent (I was not clear about that, sorry; I have amended the question.)
Gunnlaugur Briem