views:

265

answers:

6

I'm trying my best to persuade my boss into letting us use foreign keys in our databases - so far without luck.

He claims it costs a significant amount of performance, and says we'll just have jobs to cleanup the invalid references now and then.

Obviously this doesn't work in practice, and the database is flooded with invalid references.

Does anyone know of a comparison, benchmark or similar which proves there's no significant performance hit to using foreign keys? (Which I hope will convince him)

+8  A: 

From Microsoft Patterns and Practices: Chapter 14 Improving SQL Server Performance:

When primary and foreign keys are defined as constraints in the database schema, the server can use that information to create optimal execution plans.

Daniel Vassallo
sure, it can use those indexes to optimize...but that doesn't automatically mean there is no negative impact on performance. I think too many factors are involved to put this question to sleep with this reference.
Roland Bouman
@Roland: I see your point, and I agree with you. However my intention was to address the "I'm trying my best to persuade my boss" part of the question.
Daniel Vassallo
@Roland: In addition, the performance hit is very rarely severe. As you said in your answer, this is difficult to claim without knowledge of the application, but in practice the benefits of foreign keys are rarely traded for performance. Related SO post: http://stackoverflow.com/questions/1744878/can-foreign-keys-hurt-query-performance/
Daniel Vassallo
Good link, I'll send him that with the quote as well :-)
Steffen
What that statement means is that, for example, the optimizer will know that a column defined as the only member of a primary key is both not-null and uniquely defined. Similarly, a single-column foreign key's number of distinct values is constrained by the cardinality of the table on which it depends (plus possibly null).
Adam Musch
+2  A: 

It is OK to be concerned about performance, but making paranoid decisions is not.

You can easily write benchmark code to show results yourself, but first you'll need to find out what performance your boss is concerned about and detail exactly those metrics.

As far as the invalid references ar concerned, if you don't allow nulls on your foreign keys, you won't get invalid references. The database will esception if you try to assign an invalid foreign key that does not exist. If you need "nulls", assign a key to be "UNDEFINED" or something like that, and make that the default key.

Finally, explain database normalisation issues to your boss, because I think you will quickly find that this issue will be more of a problem than foreign key performance ever will.

BenB
I'm pretty certain it's not any exact performance metric he's worried about, as he just states "it hurts performance" and that's pretty much it.Obviously this doesn't make it easier to prove wrong :-(
Steffen
+2  A: 

Does anyone know of a comparison, benchmark or similar which proves there's no significant performance hit to using foreign keys ? (Which I hope will convince him)

I think you're going about this the wrong way. Benchmarks never convince anyone.

What you should do, is first uncover the problems that result from not using foreign key constraints. Try to quantify how much work it costs to "clean out invalid references". In addition, try and gauge how many errors result in the business process because of these errors. If you can attach a dollar amount to that - even better.

Now for a benchmark - you should try and get insight into your workload, identify which type of operations are done most often. Then set up a testing environment, and replay those operations with foreign keys in place. Then compare.

Personally I would not claim right away without knowledge of the applications that are running on the database that foreign keys don't cost performance. Especially if you have cascading deletes and/or updates in combination with composite natural primary keys, then I personally would have some fear of performance issues, especially timed-out or deadlocked transactions due to side-effects of cascading operations.

But no-one can tell you- you have to test yourself, with your data, your workload, your number of concurrent users, your hardware, your applications.

Roland Bouman
Good point about the application, however the tables are really simple - all primary keys are identity ints, and we have no cascading delete/updates.So basically it's as simple as it gets :-D
Steffen
Steffen, that is useful info. I would probably not have any reservations, unless dealing with massive write loads.
Roland Bouman
I believe we're doing way more reading than writing, so this shouldn't be much of an issue either.Thanks for the comment :-)
Steffen
+1  A: 

A significant factor in the cost would be the size of the index the foreign key references - if it's small and frequently used, the performance impact will be negligible, large and less frequently used indexes will have more impact, but if your foreign key is against a clustered index, it still shouldn't be a huge hit, but @Ronald Bouman is right - you need to test to be sure.

Peter LaComb Jr.
+4  A: 

There is a tiny performance hit on inserts, updates and deletes because the FK has to be checked. For an individual record this would normally be so slight as to be unoticable unless you start having a ridiculous number of FKs associated to the table (Clearly it takes longer to check 100 other tables than 2). This is a good thing not a bad thing as databases without integrity are untrustworthy and thus useless. You should not trade speed for integrity. That performance hit is usually offset by the better abilty to optimize execution plans.

We have a medium sized database with around 9 million records and FKs everywhere they should be and rarely notice a performance hit (except on one badly designed table that has well over 100 foreign keys, it is a bit slow to delete records from this as all must be checked). Almost every dba I know of who deals with large, terabyte sized databases and a true need for high performance on large data sets insists on foreign key constraints becausue integrity is key to any database. If the people with terabyte-sized databases can afford the very small performance hit, then so can you.

FKs are not automatically indexed and if they are not indexed this can cause performance problems.

Honestly, I'd take a copy of your database, add properly indexed FKs and show the time difference to insert, delete, update and select from those tables in comparision with the same from your database without the FKs. Show that you won't be causing a performance hit. Then show the results of queries that show orphaned records that no longer have meaning because the PK they are related to no longer exists. It is especially effective to show this for tables which contain financial information ("We have 2700 orders that we can't associate with a customer" will make management sit up and take notice).

HLGEM
Very useful information. I'm currently adding foreign keys to a copy of the database, however it takes some time, since I have to delete like 20-30 million invalid rows first.This just proves how big a problem the lack of keys was in the first place.
Steffen
+1  A: 

This is more of a political issue than a technical one. If your project management doesn't see any value in maintaining the integrity of your data, you need to be on a different project.

If your boss doesn't already know or care that you have thousands of invalid references, he isn't going to start caring just because you tell him about it. I sympathize with the other posters here who are trying to urge you to do the "right thing" by fighting the good fight, but I've tried it many times before and in actual practice it doesn't work. The story of David and Goliath makes good reading, but in real life it's a losing proposition.

Dave Ziffer