views:

167

answers:

1

I have read that one of the tradeoffs for adding table indexes in SQL Server is the increased cost of insert/update/delete queries to benefit the performance of select queries.

I can conceptually understand what happens in the case of an insert because SQL Server has to write entries into each index matching the new rows, but update and delete are a little more murky to me because I can't quite wrap my head around what the database engine has to do.

Let's take DELETE as an example and assume I have the following schema (pardon the pseudo-SQL)

TABLE Foo
 col1 int
,col2 int
,col3 int
,col4 int
PRIMARY KEY (col1,col2)

INDEX IX_1
col3
INCLUDE 
col4

Now, if I issue the statement

DELETE FROM Foo WHERE col1=12 AND col2 > 34

I understand what the engine must do to update the table (or clustered index if you prefer). The index is set up to make it easy to find the range of rows to be removed and do so.

However, at this point it also needs to update IX_1 and the query that I gave it gives no obvious efficient way for the database engine to find the rows to update. Is it forced to do a full index scan at this point? Does the engine read the rows from the clustered index first and generate a smarter internal delete against the index?

It might help me to wrap my head around this if I understood better what is going on under the hood, but I guess my real question is this. I have a database that is spending a significant amount of time in delete and I'm trying to figure out what I can do about it.

When I display the execution plan for the deletion, it just shows an entry for "Clustered Index Delete" on table Foo which lists in the details section the other indices that need to be updated but I don't get any indication of the relative cost of these other indices.

Are they all equal in this case? Is there some way that I can estimate the impact of removing one or more of these indices without having to actually try it?

+2  A: 

Nonclustered indexes also store the clustered keys.
It does not have to do a full scan, since:

  • your query will use the clustered index to locate rows
  • rows contain the other index value (c3)
  • using the other index value (c3) and the clustered index values (c1,c2), it can locate matching entries in the other index.

(Note: I had trouble interpreting the docs, but I would imagine that IX_1 in your case could be defined as if it was also sorted on c1,c2. Since these are already stored in the index, it would make perfect sense to use them to more efficiently locate records for e.g. updates and deletes.)

All this, however has a cost. For each matching row:

  • it has to read the row, to find out the value for c3
  • it has to find the entry for (c3,c1,c2) in the nonclustered index
  • it has to delete the entry from there as well.

Furthermore, while the range query can be efficient on the clustered index in your case (linear access, after finding a match), maintenance of the other indexes will most likely result in random access to them for every matching row. Random access has a much higher cost than just enumerating B+ tree leaf nodes starting from a given match.
Given the above query, more time is spent on the non-clustered index maintenance - the amount depends heavily on the number of records selected by the col1 = 12 AND col2 > 34 predicate.

My guess is that the cost is conceptually the same as if you did not have a secondary index but had e.g. a separate table, holding (c3,c1,c2) as the only columns in a clustered key and you did a DELETE for each matching row using (c3,c1,c2). Obviously, index maintenance is internal to SQL Server and is faster, but conceptually, I guess the above is close.

The above would mean that maintenance costs of indexes would stay pretty close to each other, since the number of entries in each secondary index is the same (the number of records) and deletion can proceed only one-by-one on each index.

If you need the indexes, performance-wise, depending on the number of deleted records, you might be better off scheduling the deletes, dropping the indexes - that are not used during the delete - before the delete and adding them back after. Depending on the number of records affected, rebuilding the indexes might be faster.

andras
Add on top of that the cost of locking each index. For the purpose of locking, each index is a separate resource.
Chris Bednarski
@KNoodles: indeed, locking introduces additional overhead. (Though if there are many fine-grained lock operations taking place, they might be coarsened to a single big lock to avoid the overhead for the rest of the operation.)
andras