What methods are there for indentifying superfluous columns in covering indices: columns which are never searched against, and therefore may be extracted into Include's, or even removed completely without affecting the applicability of the index?
views:
78answers:
2To clarify things
The idea of a covering index is that it also includes columns which may not be searched by (used in the WHERE clause and such) but may be selected (part of the SELECT columns list).
There doesn't seem to be any easy way to assert the existence of unused colums in a covering index. I can only think of a painstaking process below:
- For a representative period of time, record all queries being run on the server (or on the table desired)
- Filter out (through regular expression) queries not involving the underlying table
- For remaining queries, obtain the query plan; discard queries not involving the index in question
- For the remaining queries, or rather for each "template" of query (many queries are same but for the search criteria values), make the list of the columns from the index that are either in select or where clause (or in JOIN...)
- the columns from the index not found in that list are positively good to go.
Now, there may be a few more [columns to remove] because the process above doesn't check in which context the covering index is used (it is possible that it be used for resolving the where, but that the underlying table is still accessed as well (for example to get to columns not in the covering index...)
The above clinical approach is rather unattractive. An analytical approach may be preferable:
Find all queries "templates" that may be used in all the applications using the server. For each of these patterns, find the ones which may be using the covering index. These are (again a few holes...) queries that:
- include a reference to the underlying table
- do not cite in any way a column from the underlying table that is not a column in the index
- do not use a search criteria from the underlying table that is more selective that the columns of the index (in their very order...)
Or... without even going to the applications: think of all the use cases, and if queries that would serve these cases would benefit of not from all columns in the index. Doing so would imply that you have a relatively good idea of the selectivity of the index, regarding its first few columns.
If you do audits of your use cases and data points, obviously anything that isn't used or caught in the audit is a candidate for deletion. If the database lacks such a thorough audit, you can save a time-window's worth of queries that hit the database by running a trace and saving it. You can analyze the trace and see what type of queries are hitting the database and from there intuit which columns can be dropped.
Trace analysis is typically used to find candidates for missing indices, but I'm guessing that it could be also used to analyze usage trends.