views:

228

answers:

5

I don't see the point of clustered index, when will we benefit?

A: 

Look here, half way down the page it says:

Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)

speed sounds like an excellent reason to me .. or am missing your point?

The advantage of the clustered index is that it can be accessed (and thus searched through) with fewer io operations than 'normal' indexes. Knowing this you can optimize your DB accesses and thus your application, by placing the clustered index where it will benefit you most.

lexu
I think he's looking for a concrete example to wrap his head around it.
ceejayoz
+2  A: 

Clustered indexes

A clustered index means that the records are physically stored in order (at least near each other), based on the index. Clustered indexes are most important when you are retrieving various columns from each record, in order, because the database engine does not have to jump around to get the next record. Instead, the records are stored sequentially, therefore the seek time between records is at its minimum.

Clustered indexes are most important when reading multiple records that appear near each other in the index.

By default, with InnoDB, your primary index is a clustered index.

Use case for clustered indexes

If you were doing an incremental search like the Google and Yahoo search, where as you start typing, you see the first few records that match what you've typed so far, performance is paramount. If you were returning just a single indexed column in the result set, you wouldn't need a clustered index, but let's pretend that you also want to return the number of hits for each key_word, forcing the database engine to access the actual row. Since you want to return sequential rows, you should store them sequentially for optimal performance.

SELECT key_word, hits FROM keywords
WHERE key_word LIKE 'britney s%'
ORDER BY key_word
LIMIT 10

You'd want your primary key (clustered index) to be on key_word.

Comparison to nonclustered indexes

All indexes are physically stored in order (a btree actually, but basically), so if you are returning just the column that is stored in the index, you're still getting the same benefit. That is because the indexed column's actual value is stored in the index, therefore MySQL will use the index value instead of reading the record. However, if you start retrieving columns that aren't part of the index, this is where you'd also want the actual records stored in order, such as they are with a clustered index.

MySQL Documentation on clustered indexes

Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)

In InnoDB, the records in nonclustered indexes (also called secondary indexes) contain the primary key columns for the row that are not in the secondary index. InnoDB uses this primary key value to search for the row in the clustered index. If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

MySQL Clustered and Secondary Indexes

Marcus Adams
Will InnoDB use the unique index for cluster if there is no primary key?Or it'll still use the invisible internal primary key ?
symfony
@symfony, it will use the internal primary key. That is why it is best to have a primary key on each table with InnoDB.
Marcus Adams
To your updated example:I think a normal index is enough,since the result set only has a size of 10 records,not a great deal.Is that right or I'm understanding it wrongly?
symfony
It's not the result set that matters so much, but the number of rows in the table. If there are many rows, even though you're only returning 10, you'd still want a clustered index.
Marcus Adams
Seems this is where our opinion differs,can you prove it?I think it's the result set that matters because in the `where` clause the related column is covered by the index.If it also refers to a column which is not contained in the index,then you are right,the table size matters.Is this right?
symfony
The columns that are used in the result set are important in determining if a clustered index is needed, but not the number of rows in it. So, you're right that the query that I used as an example would work just as well with an unclustered index. I'll update the example to better demonstrate. Thanks.
Marcus Adams
@symfony, I've added the hits column to the result set to make a clustered index better fit the bill. Sorry for any confusion. Again, it's not the result set size that matters, it's whether you're accessing the rows.
Marcus Adams
Can you prove it?I don't see any reason MySQL will behave like that.IMO,MySQL fetches the rows according to the `where` clause,and after that it fetches the requires columns.In other words,it's the columns that are used by the `where` clause that matters.
symfony
@Mark_Carrington also agrees with my conclusion under this answer:http://stackoverflow.com/questions/2499306/why-is-the-index-on-s-not-used-for-sort-here/2499394#2499394
symfony
@symfony, fetching rows in the WHERE clause and in the SELECT clause both matter, and the columns in the WHERE clause definitely matter more, which is why we use indexes in the first place. However, when you start fetching rows as in the SELECT statement, that's when the clustered index pays off. I don't see any posts by Mark where he disagrees with my comments. Anyway, my best proof is to quote MySQL documentation on clustered indexes, which I'll add to my answer.
Marcus Adams
The reason our conclusion differs is:whether it's the size of the **returned result set** matters,or the size of the **matched result set**?Do you know that MySQL will stop fetching as soon as it collected as many records as `limit` specified?
symfony
+1  A: 

The best example I can think of is a reporting table that is queried regularly on date of transaction(s). I would put a clustered index on the TransactionDate column and add any other required indexes based on query optimization.

So queries like select sum (amount) from transactiondetails where TransactionDate > 'jan 01 2010' and TransactionDate < 'feb 01 2010' will use the clustered index to do seeks and will come up with results in a more efficient way.

Raj More
I don't think it's a reason to use clustered index.A normal index on TransactionDate is enough,isn't it?
symfony
@symfony, only if the index were a compound index and included both transactionDate and amount. Otherwise, the disk still has to bounce around to retrieve the amount column for each record.
Marcus Adams
A clustered index actually orders the physical data according to the index (which is why you can usually only have 1 clustered index per table). This makes the scan very efficient because it can just scan the pages sequentially off the disk. Using a normal index the data pages are scattered around and the seek time from page to page becomes expensive.
Mike Q
@symfony, to be more clear, if he were retrieving just one amount, an unclustered index would yield similar performance. However, since he's actually reading multiple amounts, it's best for them to be near each other.
Marcus Adams
@Marcus Adams ,can you confirm/deny my conclusions under this answer:http://stackoverflow.com/questions/2499306/why-is-the-index-on-s-not-used-for-sort-here/2499394#2499394
symfony
@symfony, Sure. I posted my answer to your question. I hope this helps.
Marcus Adams
+1  A: 

A real address book (a dead tree edition), ordered by first name, resembles a clustered index in its structure and purpose.

Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items are selected.

Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required.

Source: Wikipedia: Database Index - Clustered

Daniel Vassallo
Will `explain` show special information when it's using clustered index?
symfony
A: 

With a clustered index the rows are stored physically on the disk in the same order as the index. There can therefore be only one clustered index.

See the origin answer in Stackoverflow

Achim Tromm