views:

1026

answers:

13

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Donald Knuth). My SQL tables are unlikely to contain more than a few thousand rows each (and those are the big ones!). SQL Server Database Engine Tuning Advisor dismisses the amount of data as irrelevant. So I shouldn't even think about putting explicit indexes on these tables. Correct?

+2  A: 

If the rows have narrow width, and a few thousand rows fit on say 10-20 8K pages, it is unlikely that the SQL optimiser would elect to use an index even if you create one.

Mitch Wheat
A: 

I guess there is an auto indexing on the primary key of the table which should be sufficient when querying on a table with less data.

So, yes explicit indexes can be avoided in case there is a small data set to be worked upon.

Nrj
+1  A: 

Put indexes ONLY if you have to :)
There are times when putting indexes can actually hurt performance, depending on what the table is used for...
So, in other words, you would think about putting indexes on tables when it is necessary as determined by profiling the application.

Aleris
+1  A: 

Indexes are often created implicitly when using UNIQUE constraints. I wouldn't try to avoid their use in that case!

Greg Hewgill
Fair point. I edited the question to say "explicit indexes".
onedaywhen
If you add a UNIQUE constraint, the database will always (not just "often") add an index.
Sten Vesterli
+5  A: 

Primary key columns will be indexed for the unique constraint. I would still index all foreign key columns. The optimizer can choose to ignore your index if it is irrelevant.

If you only have a little bit of data then the extra cost for insert/update should not be significant either.

WW
+15  A: 

The value of indexes is in speeding reads. For instance, if you are doing lots of SELECTs based on a range of dates in a date column, it makes sense to put an index on that column. And of course, generally you add indexes on any column you're going to be JOINing on with any significant frequency. The efficiency gain is also related to the ratio of the size of your typical recordsets to the number of records (i.e. grabbing 20/2000 records benefits more from indexing than grabbing 90/100 records). A lookup on an unindexed column is essentially a linear search.

The cost of indexes comes on writes, because every INSERT also requires an internal insert to each column index.

So, the answer depends entirely on your application -- if it's something like a dynamic website where the number of reads can be 100x or 1000x the writes, and you're doing frequent, disparate lookups based on data columns, indexing may well be beneficial. But if writes greatly outnumber reads, then your tuning should focus on speeding those queries.

It takes very little time to identify and benchmark a handful of your app's most frequent operations both with and without indexes on the JOIN/WHERE columns, I suggest you do that. It's also smart to monitor your production app and identify the most expensive, and most frequent queries, and focus your optimization efforts on the intersection of those two sets of queries (which could mean indexes or something totally different, like allocating more or less memory for query or join caches).

joelhardi
If the tables are small enough, have a clustered index (usually the primary key) , and no covering index satisfies a query, the SQL Server optimiser will not use an index. It will instead table scan. This is due to the fact that bookup lookups into the clustered index are expensive.
Mitch Wheat
That makes sense ... row data is stored in leaf nodes of the clustered index, so SQL is searching over the same pages with an index lookup or table scan (roughly speaking). I was answering more generally -- nonclustered indexes on non-PK columns that enable a b-tree lookup in place of a linear scan.
joelhardi
+1  A: 

Absolutely incorrect. 100% incorrect. Don't put a million pointless indexes, but you do want a Primary Key (in most cases), and you do want it CLUSTERED correctly.

Here's why:

SELECT * FROM MySmallTable <-- No worries... Index won't help

SELECT
    *
FROM
    MyBigTable INNER JOIN MySmallTable ON... <-- Ahh, now I'm glad I have my index.

Here's a good rule to go by.

"Since I have a TABLE, I'm likely going to want to query it at some time... If I'm going to querey it, I'm likely going to do so in a consistent way..." <-- That's how you should index the table.

EDIT: I'm adding this line: If you have a concrete example in mind, I'll show you how to index it, and how much of a savings you'll get from doing so. Please supply a table, and an example of how you plan in using that table.

Timothy Khouri
+3  A: 

It depends. Is the table a reference table?

There are tables of a thousand rows where the absence of an index, and the resulting table scans can make the difference between a fairly simple operation delaying the user by 5 minutes instead of 5 seconds. I have seen exactly this problem, using a DBMS other than SQL Server.

Generally, if the table is a reference table, updates on it will be relatively rare. This means that the performance hit for updating the index will also be relatively rare. If the optimizer passes over the index, the performance hit on the optimizer will be negligible. The space needed to store the index will also be negligible.

If you declare a primary key, you should get an automatic index on that key. That automatic index will almost always do you enough good to justify its cost. Leave it in there. If you create a reference table without a primary key, there are other problems in your design methodology.

If you do frequent searches or frequent joins on some set of columns other than the primary key, an additional index might pay for itself. Don't fix that problem unless it is a problem.

Here's the general rule of thumb: go with the default behavior of the DBMS, unless you find a reason not to. Anything else is a premature preoccupation with optimization on your part.

Walter Mitty
+1  A: 

I suggest that you follow the usual rules about indexing, which approximately means "create indexes on those columns that you use in your queries".

This might sound unnecessary with such a small database. As others have already said: as long as your database stays as small as you have described, the queries will be fast enough anyway, and the indexes aren't really needed. They can even slow down insertions and updates, but unless you have very specific requirements there, it doesn't matter with such a small database.

But, if the database grows (which databases sometimes have a tendency to do) you don't have to remember to add indexes to that old database that you've probably forgotten about by then. Maybe it has even been installed at one your customers, and you can't modify it!

I guess what I'm saying is this: indexes should be such a natural part of your database design, that it is the lack of indexes that is the optimization, premature or not.

Thomas Padron-McCarthy
A: 

Even if you have an index, SQL Server might not even use it, depending on the statistics for that table. And if you plan to put in an index for a report that will run at most a couple times a year, bear in mind that the INSERT/UPDATE penalties for adding the index will be in effect ALL THE TIME. Before adding an index, ask yourself if it is worth the performance penalty.

+2  A: 

Knuth's wise words are not applicable to the creation (or not) of indexes, since by adding indexes you are not optimising anything directly: you are providing an index that the DBMSs optimiser may use to optimise some queries. In fact, you could better argue that deciding not to index a small table is premature optimisation, as by doing so you restrict the DBMS optimiser's options!

Different DBMSs will have different guidelines for choosing whether or not to index columns based on various factors including table size, and it is these that should be considered.

What is an example of premature optimisation in databases: "denormalising for performance" before any benchmarking has indicated that the normalised database actually has any performance issues.

Tony Andrews
+1  A: 

As a general rule of thumb, it's good to avoid smaller indexes as they typically won't be used.

But sometimes they can provide a huge boost as I outlined here.

Michael K Campbell
A: 

You have to understand that based upon the query two lookups may be done, one into the index to get the pointer to the row, the next to the row itself. If the data that is being queried is in the index columns that extra step may not be necessary.

It is entirely possible that double dipping for data may be slower even if the optimizer goes after the index. Whether or not we care is up to application profiling and eventual explain plans.

Xepoch