views:

83

answers:

6

I've searched a bit and didn't see any similar question, so here goes.

How do you know when to put an index in a table? How do you decide which columns to include in the index? When should a clustered index be used?

Can an index ever slow down the performance of select statements? How many indexes is too many and how big of a table do you need for it to benefit from an index?

EDIT:

What about column data types? Is it ok to have an index on a varchar or datetime?

A: 

This is really a very involved question, though a good starting place would be to index any column that you will filter results on. ie. If you often break products into groups by sale price, index the sale_price column of the products table to improve scan times for that query, etc.

Matthew Vines
A: 

If you are querying based on the value in a column, you probably want to index that column.

i.e.

SELECT a,b,c FROM MyTable WHERE x = 1

You would want an index on X.

Generally, I add indexes for columns which are frequently queried, and I add compound indexes when I'm querying on more than one column.

Indexes won't hurt the performance of a SELECT, but they may slow down INSERTS (or UPDATES) if you have too many indexes columns per table.

As a rule of thumb - start off by adding indexes when you find yourself saying WHERE a = 123 (in this case, an index for "a").

Jamie
A: 

You should use an index on columns that you use for selection and ordering - i.e. the WHERE and ORDER BY clauses.

Indexes can slow down select statements if there are many of them and you are using WHERE and ORDER BY on columns that have not been indexed.

As for size of table - several thousands rows and upwards would start showing real benefits to index usage.

Having said that, there are automated tools to do this, and SQL server has an Database Tuning Advisor that will help with this.

Oded
the ITW is now called "Database Tuning Advisor (DTA)" in SQL Server 2005 and up
marc_s
@marc_s - Thanks for that. Answer updated.
Oded
+3  A: 

Well, the first question is easy:

When should a clustered index be used?

Always. Period. Except for a very few, rare, edge cases. A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

  • narrow
  • static (never changes)
  • unique
  • if ever possible: ever increasing

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.

Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.

Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done! See here for info on those two DMV's (missing and unused indices).

Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.

marc_s
+1 for noting that it's an ongoing process and not something you just do once.
John M Gant
Actually, our DB is both Sql Server and Postgres.. So you got a bit too specific on implementation there, but otherwise a good explanation.
Earlz
Yes, considering Oracle doesn't have clustering indexes as such (they do have index-organized tables and b-tree clusters) and a clustering index on DB2 for z/OS is used as a guideline to cluster data, but not law. Indexes can further slow down selections, if the optimizer doesn't have a good handle on cardinality of the result set -- a full scan may be less expensive than an index access.
Adam Musch
+1  A: 

Rule of thumb is primary key (implied and defaults to clustered) and each foreign key column

There is more but you could do worse than using SQL Server's missing index DMVs

An index may slow down a SELECT if the optimiser makes a bad choice, and it is possible to have too many. Too many will slow writes but it's also possible to overlap indexes

gbn
+1  A: 

Answering the ones I can I would say that every table, no matter how small, will always benefit from at least one index as there has to be at least one way in which you are interested in looking up the data; otherwise why store it?

A general rule for adding indexes would be if you need to find data in the table using a particular field, or set of fields. This leads on to how many indexes are too many, generally the more indexes you have the slower inserts and updates will be as they also have to modify the indexes but it all depends on how you use your data. If you need fast inserts then don't use too many. In reporting "read only" type data stores you can have a number of them to make all your lookups faster.

Unfortunately there is no one rule to guide you on the number or type of indexes to use, although the query optimiser of your chosen DB can give hints based on the queries you are executing.

As to clustered indexes they are the Ace card you only get to use once, so choose carefully. It's worth calculating the selectivity of the field you are thinking of putting it on as it can be wasted to put it on something like a boolean field (contrived example) as the selectivity of the data is very low.

Tony
@Tony "Otherwise why store it" What about in a system log where the log is inserted into very often(many times per minute) but the data is retrieved only when something happens where the log is needed (as in, like once every month or two)
Earlz
@Earlz: fair point, but when you do look at the log an index will help you search the millions of rows the log table contains. I can see I was being a bit over the top with that statement :)
Tony