views:

155

answers:

5

Consider a grocery store scenario (I'm making this up) where you have FACT records that represent a sale transaction, where the columns of the Fact table include

SaleItemFact Table
------------------
CustomerID  
ProductID  
Price  
DistributorID  
DateOfSale  
Etc  
Etc  
Etc  

Even if there are duplicates in the table when you consider ALL the keys, I would contend that a surrogate running numeric key (i.e. identity column) should be made up, e.g., TransactionNumber of type Integer.

I can see someone arguing that a Fact table might not have a unique key (though I'd invent one and waste the 4 bytes, but how about a dimension table?

+3  A: 

One reason, among many, to have a unique key per row (built from the data, or otherwise) is to facilitate updates or deletions to that specific row.

In any case, this question is kind of silly, because there really isn't an engineering trade-off at stake. There is no real proposed benefit to not having the key, so what's the point? True/yes, rows should have unique identifiers.

wsorenson
In Oracle, they all do; it's the ROWID pseudocolumn.
Adam Musch
+10  A: 

First normal form requires a primary key on every table. So this is the bare minimum required for good database design. What you choose for the primary key is open to much debate. But first normal form for database design is not.

Randy Minder
First normal form is for sissies. :) I'll upvote you anyway though as this is a pretty good answer
Mike Sherov
+2  A: 

For data warehouses, fact tables often have a composite primary key, usually the composite of all the foreign keys to your dimension tables.

It's rather common to not have any primary key in your fact tables as well, as they often serve no purpose other than wasting space - and for large datawarehouses the space can be quite big. Your dimension tables will have primary keys though.

If you're talking about the OLTP part of your grocery store, you would normally follow standard OLTP database design, normalize your tables and provide a primary key.

nos
A: 

True. Think conceptually, everything is unique even if its not defined by unique data. So if you enter data into a table and they have the exact same information, they are still unique as they were entered twice.

Leaving that, you gain the benefit of being able to easily select, update, delete based on the id at a relatively low cost (4 bytes). Which arguably, the larger the table, the more useful the id is. So the 4 bytes becomes less and less of a point the larger the table :-)

ocdcoder
-1: not everything is unique. See http://stackoverflow.com/questions/2390854/true-or-false-good-design-calls-for-every-table-to-have-a-primary-key-if-nothin/2390882#2390882.
John Saunders
oops, i stand corrected.
ocdcoder
+2  A: 

Because your question is under datawarehousing:

  • Dimension tables should have a surrogate (meaningless) primary key, usually an auto-increment integer; and a business key which uniquely identifies an object that the table row describes -- like an email address, full name or similar.

  • Fact tables mostly (almost always) have a primary key which is combination of two or more foreign keys.

There should be no duplicates in fact tables when combining foreign keys into the primary key. To test this, simply try to load the same transaction twice -- it should fail. An auto generated primary key will not prevent this, because it does not exists outside the warehouse. The problem can be usually solved by including the time-stamp into the primary key.

Sometimes a fact table is used as a dimension, or in a view that may act as a dimension. In this case it is convenient to have one (big)integer as a primary key, instead of several FK fields -- however, the original combination of FKs and time-stamp(s) should still uniquely identify the fact row.

Damir Sudarevic