views:

616

answers:

8

There have been several questions recently about database indexing and clustered indexing and it has been kind of new to me until the last couple weeks. I was wondering how important it is and what kind of performance gains can be expected from creating them.

Edit: What is usually the best type of fields to look at when putting in a clustered index when you are first starting out?

+7  A: 

Very veryA(G,G) important. In my opinion, wise indexing is the absolute most important thing in DB performance optimization.

This is not an easy topic to cover in a single answer. Good indexing requires knowledge of queries going to happen on the database, making a large number of trade-offs and understanding the implication of a specific index in the specific DB engine. But it's very important nevertheless.

EDIT: Basically, clustered indexes usually should have short lengths. They should be created on queries which reflect a range. They should not have duplicate entries. But these guidelines are very general and by no means the right thing. The right thing is to analyze the queries that are gonna be executed. Carefully benchmarking and analyzing execution plans and understanding what is the best way to do it. This requires years of experience and knowledge and by no means it's something to explain in a single paragraph. It's the primary thing that makes DB experts expert (It's not the only thing, but it's primitive to other important things, such as concurrency issues, availability, ...)!

Mehrdad Afshari
This cannot be overstated either, especially for tables with a lot of rows. It the difference between a table scan and a binary sort. For a 200k row table, that a difference between a maximum of 200k versus just 19 reads.
ctacke
ctacke: Very true. I updated the count of "very"s to reflect your comment!
Mehrdad Afshari
thank you! This has been a great help
jmein
+1  A: 

Without proper indexes, you force the RDBMS to do table scans to query for anything. Terribly inefficient.

I'd also infer that you don't have primary keys, which is a cardinal sin in relational design.

duffymo
A: 

@duffmyo we have primary keys on every table but from what I have heard it is not always the best solution to have a clustered index on your primary key, but would be of more use on a timestamp field

jmein
The clustered index is simply the physical sort order of the table. It's "better" than a non-clustered index because it takes up no added space. Since a table can have only one physical layout, you can only have one clustered index.
ctacke
From a perf perspective clustered v. non-clustered is negligible (non-clustered requires a final row lookup in the actual table).
ctacke
but the clustered index does not have to be on the primary key if you create it manually you can put it on whatever field that you want it on
jmein
I would recommend staying away from guids though
jmein
+2  A: 

Indexing: extremely important. Having the wrong indexes makes queries harder, sometimes to the point they can't be completed in a sensible time.

Indexes also impact insert performance and disc usage (negatively), so keeping lots of superfluous indexes around on large tables is a bad idea too.

Clustering is something worth thinking about, I think it's really dependent on the behaviour of the specific database. If you can cluster your data correctly, you can dramatically reduce the amount of IOPs required to satisfy requests for rows not in memory.

MarkR
+1  A: 

Indexing is very important when the table contains many rows.
With a few rws, performance is better without indexes.
With larger tables indexes are very important to get good performance.
It is not easy to defined them. Clustered means that the data are stored in the clustered index order.
To get good hints of indexes you could use Toad

Kb
+1  A: 

Indexing is vitally important.

The right index for a query can improve performance so dramatically it can seem like witchcraft.

Garry Shutler
A: 

As the other answers have said, indexing is crucial.

As you might infer from other answers, clustered indexing is much less crucial.

Decent indexing gives you first order performance gains - orders of magnitude are common.

Clustered indexing is a second order or incremental performance gain - usually giving small (<100%) percentages of performance increase.

(We also get into questions of 'what is a 100% performance gain'; I'm interpreting the percentage as ((oldtime - newtime)/newtime) * 100, so if the old time is 10 seconds and the new time is 5 seconds, the performance increase is 100%.)

Different DBMS have different interpretations of what a clustered index means. Beware. In particular, some DBMS cluster the data once and thereafter, the clustering decays over time until the data is reclustered. Others take a more active view of clustering, I believe.

Jonathan Leffler
A: 

The clustered index is ususally but not always your primary key. One way of looking at a clustered index is to think of the data being physically ordered based on the values of the clustered index.

This may very well not be the case in reality however refrencing clustered indexes ususally gets you the following performance bonuses anyway:

  1. All columns of the table are accessable for free when resolved from a clustered index hit as if they were contained within a covering index. (A query resolvable using just the index data without having to refrence the data pages of the table itself)

  2. Update operations can be made directly against a clustered index without intermediate processing. If you are doing a lot of updates against a table you ususally want to be refrencing the clustered columns.

  3. Depending on implementation there may be a sequential access benefit where data stored on disk is retreived quicker with fewer expensive disk seek operations.

  4. Depending on implementation there may be free index benefit where a physical index is not necessary as data access can be resolved via simple guessing game algorithms.

Don't count on #3 and especially #4. #1 and #2 are ususally safe bets on most RDBMS platforms.

Einstein