The consensus seems to be that all foreign keys need to have indexes. How much overhead am I going to incur on inserts if I follow the letter of the law?
There are two overheads: on DML
over the referencing table, and DML
over the referenced table.
A referenced table should have an index, otherwise you won't be able to create a FOREIGN KEY
.
A referencing table can have no index. It will make the INSERT
's into the referencing table a little bit slower, and won't affect INSERT
's into a referenced table.
Whenever you insert a row into a referencing table, the following occurs:
The row is checked against the FOREIGN KEY
as in this query:
SELECT TOP 1 NULL
FROM referenced ed
WHERE ed.pk = @new_fk_value
The row is inserted
- The index on the row (if any) is updated.
The first two steps are always performed, and the step 1
generally uses an index on the referenced table (again, you just cannot create a FOREIGN KEY
relationship without having this index).
The step 1
is the only overhead specific to a FOREIGN KEY
.
The overhead of the step 3
is implied only by the fact the index exists. It would be exactly the same in there were no FOREIGN KEY
.
But UPDATE
's and DELETE
's from the referenced table can be much slower if you don't define an index on the referencing table, especially if the latter is large.
Whenever you DELETE
from the referenced table, the following occurs:
The rows are checked against the FOREIGN KEY
as in this query:
SELECT TOP 1 NULL
FROM referencing ing
WHERE ing.fk = @old_pk_value
The row is deleted
- The index on the row is updated.
It's easy to see that this query will most probably benefit from an index on referencing.fk
.
Otherwise, the optimizer will need to build a HASH TABLE
over the whole table even if you are deleting a single record to check the constraint.