views:

421

answers:

5

I understand the need to have referential integrity for limiting specific values on entry or possibly preventing them from removal upon a request of deletion. However, I am unclear as to a valid use case which would exclude this mechanism from always being used.

I guess this would fall into several sub-questions:

  1. When is referential integrity not appropriate?
  2. Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
  3. Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)

Thoughts?

+10  A: 

When is referential integrity not appropriate?

Referential intergrity if typically not used on Data Warehouses where the data is a read only copy of a transactional datbase. Another example of when you'd not need RI is when you want to log information which includes row ids; maintaining referential integrity for a read-only log table is a waste of database overhead.

Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?

Sometimes you care more about capturing data than data quality. Imagine you are aggregating a large amount of data from disparate systems which each in their own right suffer from data quality issues. Sometimes you are after the greater good of data quality and having everything in one place even with broken keys etc. represents a starting point for moving towards true data quality. It's not ideal, but it does happen as the beenfits could outweigh the tradeoffs.

Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)

Everything about systems development is centered around information security, and a key element of that is data integrity. The database structure should lean towards enforcing these things when possible, however you often are not dealing with modern database systems. Sometimes your data source is an old school AS400 with long-antiquated apps. Sometimes you have to build a data and business layer which provide for data integrity.

Just my thoughts.

Nissan Fan
+1 for mentioning data warehouses, logging and connecting disparate systems.
JKG
Just curious. If you have a read-only database with reference integrity, how does this contribute to any overhead? I was under the impression that most of the overhead with RI was related to update / delete operations.
Kimble
@Kimble: It contributes storage overhead for any indexes involved.
Andy West
Having RI on a data warehouse will significantly slow down loading because it will force you to schedule loading of parents before children instead of being able to do them in parallel.
lins314159
+2  A: 
  1. Never, though a few people in the NoSQL, the multi-value, and oo-db realms will feel differently. Don't listen to them, they're wrong.
  2. Yes. For example, if a vehicle is identified uniquely as (lotid,vin) then lotid is a foreign key to the lot table. If you want to find all pictures for a lot you can join the vehicle_pictures table right to the lot table, by using a subset of the vehicle_pictures key (lotid in (lotid,vin)). Or, am I not understanding you?
  3. Schema, interface comes second. If the schema is bad, having a nice interface is not a long term goal.
Evan Carroll
I'll vote you up just for point #3. I've never had to re-write an app because the interface was bad, but I've re-written two where the previous developer had no idea how to structure the data and the design "coded us into a corner". (Having to add columns to a table every month for a new promotional campaign rather than adding new records to a promotional campaigns table was the worst example I've seen.)
David Stratton
+6  A: 

The only case I have heard of is if you are going to load a vast amount of data into your database; in that case, it may make sense to turn referential integrity off, as long as you know for certain that the data is valid. Once your loading/migration is complete, referential integrity should be turned back on.

There are arguments about putting data validation rules in programming code vs. the database, and I think it depends on the use cases of your software. If a single application is the only path to the database, you could put validation into the program itself and probably be alright. But if several different programs are using the database at the same time (e.g. your application and your friend's application), you'll want business rules in the database so that your data is always valid.

By 'validation rules', I am talking about rules such as 'items in cart > 0'. You may or may not want validation rules. But I think that primary/foreign keys are always important (or you could find later on that you wish you had them). I think they are required if you want to do replication at some point.

Jon
+4  A: 

Referential integrity would always be appropriate if it didn't come at the cost of performance, scalability, and/or other features.

In some applications, referential integrity may be traded for something more important than the quality of the data.

Daniel Vassallo
+4  A: 
  1. When is referential integrity not appropriate?

    Sometimes when you are copying lots of records in bulk, or restoring data from some sort of backup, it is convenient to temporarily turn off the constraints of referential integrity.

  2. Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?

    Duplicating data in this way goes against the concept of normalization. There are are advantages and disadvantages to this approach.

  3. Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)

    I would consider it a schema design decision. Think about the best way to model your problem in relational terms. Use the database in the way it was intended.

Andy West