views:

282

answers:

5

I'm getting my first exposure to data warehousing, and I’m wondering is it necessary to have foreign key constraints between facts and dimensions. Are there any major downsides for not having them? I’m currently working with a relational star schema. In traditional applications I’m used to having them, but I started to wonder if they were needed in this case. I’m currently working in a SQL Server 2005 environment.

UPDATE: For those interested I came across a poll asking the same question.

+1  A: 

I don't know about necessary, but I feel they are good for data integrity reasons. You want to make sure that your fact table is always pointing to a valid record in the dimension table. Even if you are sure this will happen, why not have the database validate the requirement for you?

jaltiere
+2  A: 

I think in theory, you need that. But it depends on how you separate your data over database. If all of them in the same database, foreign key can help you because setting foreign key will help the database do selecting faster based on the indexing. If you share tables over many database, you need to check it on your application level

You can have your database check it for you but it can be slow. And generally, in data warehouse, we don't care about redundancy or integrity. We already have a lot of data and a few integrity and redundancy will not affect the general aggregate data

vodkhang
I mostly concur. Though I would have worded it that "Having Foreign Keys allows the database to choose the right index because it KNOWS that the relationships exist." So I think you should have them, but I don't think you need them.
MJB
Yeah, as I said, we should have them because of higher performance and integrity. But if we have to get rid of them, just going ahead:)
vodkhang
+1 Good points about indexing and performance.
Garett
+1  A: 

The reasons for using integrity constraints in a data warehouse are exactly the same as in any other database: to guarantee the integrity of the data. Assuming you and your users care about the data being accurate then you need some way of ensuring that it remains so and that business rules are being correctly applied.

dportas
+5  A: 

Most data-warehouses (DW) do not have foreign keys implemented as constraints, because:

  • In general, foreign key constraint would trigger on: an insert into a fact table, any key-updates, and a delete from a dimension table.

  • During loading, indexes and constraints are dropped to speed-up the loading process, data integrity is enforced by the ETL application.

  • Once tables are loaded, DW is essentially read-only -- the constraint does not trigger on reads.

  • Any required indexes are re-built after the loading.

  • Deleting in a DW is a controlled process. Before deleting rows from dimensions, fact tables are queried for keys of rows to be deleted -- deleting is allowed only if those keys do not exists in any of fact tables.

Just in case, it is common to periodically run queries to detect orphan records in fact tables.

Damir Sudarevic
Thanks for the great feedback Damir. The fact that the system is essentially read only was what made me start to question the need for contraints.
Garett
Not sure if I should add here or to the similar question but...If integrity is an issue you can always right integrity functions or stored procedures that look for "orphaned" facts. (Rows where the foreign keys don't make sense). You can then clean those up after/during/before the next cycle of loads on your database.
Markus
A: 

We use them, and we're happy with it.

http://stackoverflow.com/questions/2690818/is-it-good-practice-to-have-foreign-keys-in-a-datawarehouse-relationships/2716509#2716509

There is overhead, but you can always disable the constraint during load and then re-enable it.

Having the constraint in place can catch ETL bugs and modelling defects.

Cade Roux