views:

1365

answers:

9

I am maintaining a pretty sizable application and database and am noticing some poor database performance in a few of our stored procedures.

I always hear that "adding an index" can be done to help performance. I am certainly no DBA, and I do not understand what indexes are, why the help, and how to create them.

I basically need an indexes 101.

Can anyone give me resources so that I can learn?

+4  A: 

An index basically sorts your data on the given columns and then stores that order, so when you want to find an item, the database can optimize by using binary search (or some other optimized way of searching), rather than looking at each individual row.

Thus, if the amount of data you are searching through is large, you will absolutely want to add some indexes.

Most databases have a tool to explain how your query will work (for db2, it's db2expln, something similar probably for sqlserver), and a tool to suggest indexes and other optimizations (db2advis for db2, again probably something similar for sqlserver).

Mike Stone
Most SQL databases implement some variant of "EXPLAIN <query>" to output a query plan. In PostgreSQL, "EXPLAIN ANALYZE <query>" additionally runs the query and times each part.
Nick Johnson
A: 

An index can be explained as a sorted list of the items in a register. It is very quick to lookup the position of the item in the register, by looking for it's key in the index. Next the the key in the index is a pointer to the position in the register where the rest of the record can be found.

You can have many indexes on a register, but the more you have, the slower inserting new records will be (because each index needs a new record as well - in a sorted order, which also adds time).

Ludvig A Norin
+16  A: 

Think of an index similar to a card catalog in the library. An index keeps you from having to search through every isle or shelf for a book. Instead, you may be able to find the items you want from a commonly used field, such as and ID, Name, etc. When you build an index the database basically creates something separate that a query could hit to rather than scanning the entire table. You speed up the query by allowing it to search a smaller subset of data, or an optimized set of data.

CodeRot
+1  A: 

Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes, they are just used to speed up queries.

Basically, your DBMS will create some sort of tree structure which points to the data (from one column) in a sorted manner. This way it is easier to search for data on that column(s).

http://en.wikipedia.org/wiki/Index_(database)

Correct Link: http://en.wikipedia.org/wiki/Index_(database)
Chris Ballance
I fixed the link in the question
Click Upvote
A: 

Indexes is a method which database systems use to quickly find data. The real world analogy are indexes in books. If an author/publisher does a good job at indexing their book, it becomes pretty easy for the reader to directly go to the page they want to read simply by looking at the index. Same goes for a database. If an index is created on a field, the database pre-sorts the data. When a request is made on the data, the database uses the index to identify which location the data is stored in on the hard disk, and directly goes there. If there are no indexes, the database needs to look at every record in order to find out if it meets the criteria(s) of your query.

A simple way to look at indexes is by thinking of a deck of cards. A database which is not indexed is like a deck a cards which have been shuffled. If you want to find the king of spades, you need to look at every card one by one to find it. You might be lucky and it can be the first one, or you might be unlucky and it can be the last one.

A database which is indexed, has all the cards in the deck ordered from ace to king and each suite is set aside in its own pile. Looking for the king of spades is much simpler now because you simply need to look at the bottom of the pile of cards which contains the spades.

I hope this helps. Be warned though that although indexes are necessary in a relational database system, they can counter productive if you write too many of them. There's a ton of great articles on the web that you can read up on indexes. I'd suggest doing some reading before you dive into them.

A: 

for mssql (and maybe others) the syntax looks like:

create index <indexname> on <tablename>(<column1>[,<column2>...])
TrevorD
+1  A: 

Some more index information!

Clustered indexes are the actual physical layout of the records in the table. Hence, you can only have one per table.

Nonclustered indexes are the aforementioned card catalog. Sure, the books are arranged in a particular order, but you can arrange the cards in the catalog by book size, or maybe by number of pages, or by alphabetical last name.

Something to think about -- creating too many indexes is a common pitfall. Every time your data gets updated your DB has to seek through that index and update it, inserting a record into every index on that table for that new row. In transactional systems (think: NYSE's stock transactions!) that could be an application killer.

nathaniel
+14  A: 

As a rule of thumb, indexes should be on any fields that you use in joins or where clauses (if they have enough different values to make using an index worthwhile, field with only a few possible values doesn't benefit from an index whci is why it is pointless to try ti indexa bit field).

If your structure has formally created primary keys (which it should, I never create a table without a primary key), those are by definition indexed becasue a primary key is required to have a unique index on it. People often forget that they have to index the foreign keys becasue an index is not automatically created when you set up the foreign key relationsship. Since the purpose of a foreign key is to give you a field to join on, all foreign keys should probably be indexed.

Indexes once created need to be maintained. If you have a lot of data change activity, they can get fragmented and slow performance and need to be refreshed. Read in Books online about indexes. YOU can also find the syntax fro the create index statement there.

Indexes are a balancing act, every index you add usually will add time to data inserts, updates and deletes but can potentially speed up selects and joins in complex inserts, updates and deletes. There is no one formula for what are the best indexes although the rule of thumb above is a good place to start.

HLGEM
+2  A: 

As previously stated, you can have a clustered index and multiple non-clustered indexes. In SQL 2005, you can also add additional columns to a non-clustered index, which can improve performance where a few commonly retrieved columns are included with the index but not part of the key, which eliminates a trip to the table altogether.

Your #1 tool for determining what your SQL Server database is doing is the profiler. You can profile entire workloads and then see what indexes it recommends. You can also look at execution plans to see what effects an index has.

The too-many indexes problem is due to writing into a database, and having to update all the indexes which would have a record for that row. If you're having read performance, it's probably not because of too many indexes, but too few, or too unsuitable.

Cade Roux