views:

247

answers:

10

The consensus seems to be that all foreign keys need to have indexes. How much overhead am I going to incur on inserts if I follow the letter of the law?

NOTES:

  1. Assume that the database is a good design, and that all of the joins are legitimate.
  2. All Primary and Foreign Keys are of type Int.
  3. Some tables are lookup tables, with fewer than ten records, that are not likely to grow in size.
  4. It is an OLTP database.
  5. Some of the joins are to lookup tables with fewer than 10 records.
+1  A: 

There's no need to put an index on foreign keys that point into lookup tables with small numbers of elements.

McWafflestix
...but the index is used for the rows in the child table, for DRI checks, to deal with lookup row deletes etc
gbn
gbn, put in an answer and I will upvote it.
Robert Harvey
@Robert: Thanks, but birdlips answered first with his article link
gbn
But it's the wrong answer in many cases. Cardinality 10 will commonly give you table scans if you want data beyond the index key.
le dorfier
Unless you need to sometimes delete from those "lookup tables with small numbers of elements".
AlexKuznetsov
@alexkuznetsov: Assuming "small numbers of elements" is actually small, such as 10-20, not really. This is, of course, making the assumption that a drop from a small enumeration table is a very rare occurrence that occurs only at database upgrade time (i.e., not interactive).
McWafflestix
In my scenario it would be an exceptionally rare occurrence, that would only happen if we decided to re-architect the application.
Robert Harvey
+1  A: 

The only possible way to answer you question is to test. For instance, if any of the keys have a cardinality of 10, they probably won't be very helpful. So you've got some work to do testing. But it has a lot to do with your table sizes, the key sizes, the absolute activity level and the mix of CRUD elements. Distrust all simple answers.

EDIT:

If you have no data currently because this is the initial design, start with only the obvious indexes and add others as you need them based on testing. It makes little sense to add them all unless it's a low-change database. But if it's read-only, there's not much penalty at all. (Another piece of information you haven't provided.)

le dorfier
Honestly, I'm not a big fan of this kind of testing. It requires generating large amounts of test data that, by definition, will not be representative of our actual usage patterns which, since this is a new project, we don't have knowlege of.
Robert Harvey
Your usage pattern and table sizes will have different characteristics at different times, so will need different answers to the question. There's not much point to test for data profiles you don't currently have anyway - testing for existing profiles is challenging enough. You're going to have a pretty difficult time designing databases that work if you aren't inclined to test, IMHO. Now there's a good SO question for you to ask.
le dorfier
<<Your usage pattern and table sizes will have different characteristics at different times, so will need different answers to the question>> ... I think you made my point. If the indexes are going to need to be adjusted based on actual usage patterns, then up-front testing is premature.
Robert Harvey
+1 for a good discussion of the issues.
Robert Harvey
+2  A: 

Here is a decent list of examples of when and what type of index to use. I don't think you should accept the "law" and index everything. You need to define what will be used in the query joins and index accordingly

northpole
The relevant bit is this: "UPDATE or DELETE operation in a PRIMARY KEY/FOREIGN KEY relationship"
gbn
The other relevant bit is "indexes to consider", not "indexes to create". There is far more involved to answer the question.
le dorfier
+2  A: 

There is a significant performance penalty on inserts as all of the indexes need to be updated. Roughly, you will incur one disk write for the insert on the large table and slightly more than one (on average) for each index on the table. Each index leaf node will incur a write, and some additional writes will occur from time to time as the leaf and (less often) parent nodes split.

Each table and index write will also incur log traffic. The particularly nasty penalty is on bulk inserted data, as active indexes on tables where you are inserting bulk loaded data will be updated for each row - and these updates are not minimally logged. This will massively blow out your I/O (which will all be random access rather than nice sequential bulk writes) and will also generate vast amounts of log traffic.

ConcernedOfTunbridgeWells
OLTP in a properly normalized database should work on an insert-based model rather than an update-based model for changes.I agree that it is a huge penalty on bulk insert, but how does one evaluate the impact on an OLTP database where we do a single/small rowset insert at a time?
Raj More
I don't think there is a good way, without some form of multiple-user load testing. But ten indexes on one table (eleven if you count the primary key) still seems excessive to me.
Robert Harvey
A: 

Are the fields going to be used in searching and sorting? If so an index might be a good idea. Only way to know is to test measure and test again

Edit: The look table will probally be cached but that won't help a search query against the referencing table. Your data table that is.

JoshBerke
They might be used for searching and sorting, but the lookup table will be cached anyway, given its size.
Robert Harvey
+1  A: 

The consensus seems to be that all foreign keys need to have indexes. How much overhead am I going to incur on inserts if I follow the letter of the law?

There are two overheads: on DML over the referencing table, and DML over the referenced table.

A referenced table should have an index, otherwise you won't be able to create a FOREIGN KEY.

A referencing table can have no index. It will make the INSERT's into the referencing table a little bit slower, and won't affect INSERT's into a referenced table.

Whenever you insert a row into a referencing table, the following occurs:

  1. The row is checked against the FOREIGN KEY as in this query:

    SELECT  TOP 1 NULL
    FROM    referenced ed
    WHERE   ed.pk = @new_fk_value
    
  2. The row is inserted

  3. The index on the row (if any) is updated.

The first two steps are always performed, and the step 1 generally uses an index on the referenced table (again, you just cannot create a FOREIGN KEY relationship without having this index).

The step 1 is the only overhead specific to a FOREIGN KEY.

The overhead of the step 3 is implied only by the fact the index exists. It would be exactly the same in there were no FOREIGN KEY.

But UPDATE's and DELETE's from the referenced table can be much slower if you don't define an index on the referencing table, especially if the latter is large.

Whenever you DELETE from the referenced table, the following occurs:

  1. The rows are checked against the FOREIGN KEY as in this query:

    SELECT  TOP 1 NULL
    FROM    referencing ing
    WHERE   ing.fk = @old_pk_value
    
  2. The row is deleted

  3. The index on the row is updated.

It's easy to see that this query will most probably benefit from an index on referencing.fk.

Otherwise, the optimizer will need to build a HASH TABLE over the whole table even if you are deleting a single record to check the constraint.

Quassnoi
I assume that the Referenced table is the table with the Primary Key? In the joins that I have in mind, the referenced table will have no more than about 10 records.
Robert Harvey
i.e. a lookup table.
Robert Harvey
@Robert: if you delete something from the referenced table, this can be very slow if you don't have an index on the referencing table. INSERT's into the referenced table are not affected by the FOREIGN KEY at all. INSERT's into the referencing table are affected by the index on it to the same extent as they would be affected by an index on a plain table.
Quassnoi
Assuming that the Referenced table is the lookup table, such a delete would be very unlikely, as the records in the lookup table are part of the application design. If we were to delete one of those records, some time delay would be acceptable.
Robert Harvey
+1  A: 

The only way to know the impact is to test. The answer may differ greatly depending on whether your system tends to insert large amounts of data in a bulk insert or one record at a time from the user interface. It also depends a lot on the size of the tables and the total number of indexes. Testing is the only way to know for certain what indexes you should use. A general rule of thumb is to start by indexing foreign key fields and fields you will be using in the where clauses. But that's just where to start looking at your system, not the "be all - end all" answer.

I will say that I have observed that users tend to be more tolerant of a little longer time spent on insert than they are of more time spent on querying the system. This is especially true since senior managers tend to do more querying than data entry and they can get cranky and have the power to do something about it if they feel their time is being wasted.

In a new system you need to generate test records at the expected volumn the system will have when implemented. If you don't then you will find that the queries (and design) that worked ok in a same test bed can be horrible with real users doing multiple things simultaneously against large tables. It's no fun at all to refactor a database where performance wasn't considered and tested in the design. It's no fun to pull back production changes becasue the query takes longer than the timeout setting because the developer didn't test against the true volumn (or in the case of the new project, the expected volumn).

SQL Server has tools to help you determine the best indexes. Use the indexing wizard and the executions plans to see where you need indexes. Put indexes on the fields and test inserts to see if there is a negative impact. There is no one right answer. It won't even stay the same answer for the lifetime of your database in all likelihood.

HLGEM
Tools...Yes, I will have to learn those.
Robert Harvey
+1  A: 

Insert/update/delete always hits the index and writes to it. Select sometimes hits the index to read from it, depending on the query optimizer's analysis or best guess. If you don't need an index to speed up reads (such as if the column only has a low number of potential values), then get rid of it.

If you have a billion rows in a child table and wish to delete 100 million of them because you're deleting one row from the parent table where that row is the the parent to all 100 million of the child rows, then having an index will only slow the whole operation down because the system has to delete from the index too, but won't speed the operation up because the system will not use the index to speed up choosing which rows to delete.

Justice
The parent table is not likely to change, unless we redesign.
Robert Harvey
Sure. But the point is, an index on a column with low cardinality (small number of distinct elements) will not help with read performance because the optimizer will skip the index and go straight for a table scan; moreover it will hinder insert/update/delete performance because now the database has to do double-writes to change both the data in the table as well as data in the index.
Justice
A: 

Thank you all for your input.

Based on your feedback, I think I will add indexes to all of the foreign keys EXCEPT those pointing to lookup tables (containing a small number of records that are not likely to change). This will cut the number of required foreign key indexes in half (from ten to five).

If anyone has further insight, feel free to post new answers. I still have some votes left. :)

Robert Harvey
A: 

I know performance is a critical issue.

IMO, you should consider the ramifications of not having an index (and therefore no FK) on OLTP data. You can suffer data integrity issues on such a system.

Raj More
The foreign keys do contain constraints (joins) to the lookup tables. That should be sufficient for referential integrity.
Robert Harvey