tags:

views:

33

answers:

3

Hello I would be indexing my database tables. Present constraints I am following is indexing the columns which would be most used in "where" clauses in my queries. Would that be the right constraint or are there any other constraints or checks to be followed?? And how would indexing affect the database on the whole?

A: 

You can also use indexes to prevent duplicate data, that is, you can create unique indexes. This is useful when you have some column that's not part of the primary key, but it still needs to be unique. A good example is when you are using a surrogate key as the primary key on your table, but some other column needs to also be unique.

As for how indexing affects the database as a whole, indexes are used primary for data integrity assurance and performance. Where indexes can cause problems is when loading lots of data (i.e. via bulk load or other means) since each time you load a record the indexes have to be updated. There are ways to turn this off, which is what is often done when large volumes of data are being loaded, but the trade-off is that you have to ensure that the data is correct because if it's not and you try to re-enable the indexes the operation will fail.

As for whether adding more indexes affects performance, it depends on how many indexes you already have. The more indexes you have, the more work the optimizer has to do in deciding which index to use. And also, it depends on the table size, how many columns etc.

One thing I highly recommend when determining index impact is to look at the query plan to determine which index is being used. If you are trying to make a given query run faster, this will tell you if the index helped.

dcp
I have a few columns in where clauses which are of type varchar, would that affect the performance if I index it?
sai
See my latest edit.
dcp
+1  A: 

You are right to use indexes on fields that are frequently used in WHERE clauses. A few other places to use them:

  1. Foreign key fields (that you use for joins).
  2. Fields that you use for ORDER BY.
  3. Fields that you use for GROUP BY.

As for how indexes would affect your database, they will (generally) make queries faster, inserts and updates a bit slower, and of course increase the size of the database. Assuming that database size isn't a concern, it usually comes down to a tradeoff between query performance and insert performance.

Eric Petroelje
Ok, I have a few columns in where clauses which are of type varchar, would that affect the performance if I index it?
sai
@sai - Yes, it would probably make them faster, but there are other factors involved as well (how big is the table, what other columns are used in the where clause, etc). If you are wondering about a specific query, I would create another question and post your code and schema to get a more thought-out answer.
Eric Petroelje
+1: SELECT clause is another place indexes can be used, but it's the last consideration. Partly because there is a ceiling to the amount of space MySQL allocates for indexing (1000 bytes long for MyISAM tables, and 767 bytes for InnoDB): http://dev.mysql.com/doc/refman/4.1/en/create-index.html
OMG Ponies
I have a question, do we need to index foreign key columns again?? aren't they indexed already??
sai
@sai - Yes, they should be created automatically in MySQL for InnoDB tables. Was just including that for completeness sake.
Eric Petroelje
A: 

Two reasons to index, performance of queries/DML and enforcing constraints.

For the former:

Indexes have to be maintained... new records have to be inserted, changes moved, deleted removed. For each DML against a table there's one per index as well. A table with 8 indexes means 8x more work is done per DML (roughly).

In an index data "has a place to go". 2 must go between 1 and 3. If there's no room for that new value, there are block splits (read: overhead).

You shouldn't index every column likely to be in a where clause. Low cardinality columns or range scans across highly scattered data will usually not use an index. Most of the time RDBMS can only use one index at a time per table. (There are some cases where indexes can be joined to each other). So some indexes will need to be on multiple columns.

COMMENT RESPONSE:

First if you have where clause with columnA and columnB in it. And you have two indexes one on colA and another on colB... it's likely that the optimizer will choose to use one or the other, based on selectivity of the predicate and the index itself. The other predicate will just be a filter on the results of the index scan and resulting table access. You'll be plowing through many more table blocks than needed... and if neither index is selective enough, you'll wind up with an FTS anyways.

But if you frequently need colA and colB in where clauses togehter you can build an index on both columns at the same time. Now both predicates will be used to limit the resultant table block access to only those which contain a needed row.

Now leading column becomes important.

Stephanie Page
could you throw some light on indexing multiple columns?
sai
http://www.amazon.com/Relational-Database-Index-Design-Optimizers/dp/0471719994 is the bible for index design.
Stephanie Page