views:

117

answers:

1

What does the INDEX-expression do? An example at the end:

CREATE TABLE tags (
  tag_id                       varchar(255) NOT NULL, 
  "{Users}{userID}question_id" int4 NOT NULL, 
  tag                          varchar(20), 
  CONSTRAINT tag 
    PRIMARY KEY (tag_id));
CREATE INDEX tags_tag 
  ON tags (tag);
+3  A: 

An index is a database structure that can help speed access to individual rows of the database, when searching based on the field(s) in the index.

In your example, the CREATE INDEX statement creates an index named tags_tag on the table tags using the column tag. If you wanted to search for rows from the table based on the tag field, the database might use the index look up the row more efficiently. Without an index, the database might have to resort to a full scan of the table, which can take much longer (depending on many factors, like size of the table, distribution of values, exact query criteria). Different databases also support different types of indexes, which can be used to search for data in different ways.

There is also a disadvantage of indexes: For every index, write speeds go down for that table. If you insert a row, having an index means that in addition to the database writing to the row itself, it will also have to update the index.

Deciding which columns to put an index on can be tricky, and as always, benchmarks or real-world queries against real-world data are the most accurate way of measuring performance. In general you will want indexes on columns that you will be searching on. So if you are likely to want to look up a row by tag then it definitely makes sense to put an index there. But if you have an address book, you are (probably) not going to need to search on street, ZIP/postcode or phone number so it is not going to be worth the write performance hit.

Your primary key column(s) will almost always have an index automatically generated by the database. And if you want to keep the values of a particular column unique, a UNIQUE INDEX can be created to enforce this.

This SO question asks about rules of thumb for database indexes, which may be useful.

Adam Batkin
Is there some type of thumbrules when you should you an index? I rather intuintively picked index in the example because the SO-style tags probably have the access speed the bottleneck.
Masi
Is it an index? I save md5deep(question_id, user_id, time) -hashes as a primary key to a table rather than querying each value separately?
Masi
It speeds up access time for the tradeoff: increased writing time.
Masi