views:

474

answers:

11

I realize this question may seem a little on the "green" side, but after the number of "enterprise" or "commercial" databases I've encountered I've begun to ask this question. What advantages to constraints provide to a database? I'm asking more about Foreign Key constraints rather than Unique constraints. Do they offer performance gains, or just data integrity?

I've been rather surprised at the number of relational databases without foreign keys or even without specified primary keys (just constraints on fields being not null or a unique constraint on the field).

Thoughts?

+13  A: 

"just" data integrity? You say that like it's a minor thing. In all applications, it's critical. So yes, it provides that, and it's a huge benefit.

Noon Silk
I realize that it is a huge advantage, but that's the typical acknowledgment and advantage recognition. I'm wondering if there are also performance gains to be realized with that as well.
MasterMax1313
I don't know if FK-constraints provide a performance gain; I'd suggest they may slightly detract, because, for example, it takes time to 'compute' if a delete will be allowed.But really, performance is not a consideration when using them. It's not even a decision; *always* have FKs in a relational database.
Noon Silk
I agree that FKs are imperative to a database, I'm trying to come up with additional reasons to justify a re-architecture of an existing system (data integrity is unfortunately not always enough to justify that sort of thing for decision makers).
MasterMax1313
You need to show them that data integrity is critical to the ongoing ability to trust the data that the system contains. If you can't be sure it's 'real', the data is useless. It can be orphaned, reports will be wrong, bits of data could accidentally be mixed and matched (some user gets another users info), etc. Terrible all around. It's not a decision. FKs. Always.
Noon Silk
+1 for "It's not a decision." There are a few things you never ask permission for. You just do it because it's the right (as in correct) thing to do.
Lette
+6  A: 

Data integrity is what they offer. If anything they have a performance cost (a very minor one at least).

cletus
They can improve performance of queries; so the performance picture is not so straightforward.
WW
+4  A: 

They provide both performance and data integrity, and the latter is paramount to any serious system. I cringe every time I see a database without any foreign keys and where all integrity is done through triggers (if at all). And I saw quite a bit of those out there.

Otávio Décio
+1 I also cringe when data integrity isn't even done in triggers -- it's done in application code!
Bill Karwin
@Bill - ouch! And yes, I saw that too.
Otávio Décio
I've got you both beat: data integrity done by application code in a separate, scheduled (every 5 minutes) application. I almost quit my job when I ran into this one.
MusiGenesis
+1  A: 

Integrity constraints are especially important when you integrate several applications using a shared database.

You may be able to properly manage data integrity in a single application's code (and even if you don't, at least the broken data affects only that application), but with multiple apps it gets hairy (and at the least redundant).

Thilo
+2  A: 

In relational theory, a database that allows inconsistent data isn't really a relational database. Foreign keys are necessary for data integrity and consistency to keep the database "relational"; i.e. the logical model of the database is always correct.

In practical terms, it's usually easier to define a foreign key and let the DB engine handle making sure the relation is valid. The other options are:

  • nothing - guaranteed data corruption at some point
  • DB triggers - which will usually be slower and less performant
  • application code - which will eventually cause problems when either you forget to call the right code or another application accesses the database.
AngerClown
I would add that triggers may be hard to find if you are not used to using them, but most importantly it is EXTREMELY difficult (although most of the readers wouldn't believe me) to write consistency checking triggers 100% correctly.
Michal Pravda
+2  A: 

Data is an asset. Lots of textbooks state that.

But it is actually wrong. It should rather say "correct data is an asset, incorrect data is a liability".

And database constraints give you the best possible guarantee that data is correct.

+2  A: 

Simpler Application Code

One nice thing they provide is that your application code has to do a lot less error checking and validation. Contrast these two bits of code and multiply by thousands of operations and you can see there's a big win.

get department number for employee  # it's good coz of constraints
do something with department number

vs.

get department number for employee
if department number is empty
    ...
else if department number not in list of good department numbers
    ....
else
    do something with department number

Of course, people that ignore constraints probably don't put a lot of effort into code validation anyway... :-/

Oh, and if the data constraints change, it's a database configuration issue and not a code change issue.

Mark Harrison
+3  A: 

The following, assuming you get the constraint right in the first place:-

  • Your data will be valid with respect to the constraint
  • The database knows your data will be valid with respect to the constraint and can use this when querying or updating the database (e.g. removing an unnecessary join for a query on a view)
  • The constraint is documented for future users of the database
  • A violation of the constraint will be caught as soon as possible; not in some later unrelated process that fails
WW
+2  A: 

In some DBMSs (e.g. Oracle) constraints can actually improve the performance of some queries, since the optimiser can use the constraints to gain knowledge about the structure of the data. For some examples, see this Oracle Magazine article.

Tony Andrews
This is the type of information I was after.
MasterMax1313
+2  A: 

I would say all required constraints must be in the database. Foreign key constraints prevent unusable data. They aren't a nice to have - they are a requirement unless you want a useless database. Foreign keys may hurt performance of deletes and updates but that is OK. Is it better to take a little longer to do a delete (or to tell the application not to delete this person because he has orders in the system) or to delete the user but not his data? Lack of foreign keys may cause unexpected and often serious problems in querying the data. For instance the cost reports may depend on all the tables having related data and so may fail to show important data because one or more tables have nothing to join to.

Unique constraints are also a requirement of any decent databse. If a field or group of fields must be unique, to fail to define this at the database leve is tocreate data problems that are extremely hard to fix.

You don't mention other constraints but you should. Any business rule that must always be applied to all data in the table should always be applied in the database through a datatype (such as a datatime datatype which willnot accept '02\31\2009' as a valid date), a constraint (say one that does not allow the field to have a value greater than 100) or through a trigger is the logic is so complex it cannot be handled by an ordinary constraint. (Triggers are tricky to write if you don't know what you are doing, so if you have logic this complex, you hopefully have adatabase professional on your team.) The order is important to. Datatypes are the first choice, followed by constraints, followed by triggers as a last choice.

HLGEM
+1  A: 

"Oh, and if the data constraints change, it's a database configuration issue and not a code change issue."

Unless it's a constraint that disappears from the design. In that case, there definitely IS a code impact, because some code might have been written that depends on that removed constraint being there.

That must always be taken in consideration when "laxing" or "removing" any declared constraint.

Other than that, you are of course completely right.