I don't see the point of clustered index, when will we benefit?
views:
228answers:
5Look here, half way down the page it says:
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
speed sounds like an excellent reason to me .. or am missing your point?
The advantage of the clustered index is that it can be accessed (and thus searched through) with fewer io operations than 'normal' indexes. Knowing this you can optimize your DB accesses and thus your application, by placing the clustered index where it will benefit you most.
Clustered indexes
A clustered index means that the records are physically stored in order (at least near each other), based on the index. Clustered indexes are most important when you are retrieving various columns from each record, in order, because the database engine does not have to jump around to get the next record. Instead, the records are stored sequentially, therefore the seek time between records is at its minimum.
Clustered indexes are most important when reading multiple records that appear near each other in the index.
By default, with InnoDB, your primary index is a clustered index.
Use case for clustered indexes
If you were doing an incremental search like the Google and Yahoo search, where as you start typing, you see the first few records that match what you've typed so far, performance is paramount. If you were returning just a single indexed column in the result set, you wouldn't need a clustered index, but let's pretend that you also want to return the number of hits for each key_word, forcing the database engine to access the actual row. Since you want to return sequential rows, you should store them sequentially for optimal performance.
SELECT key_word, hits FROM keywords
WHERE key_word LIKE 'britney s%'
ORDER BY key_word
LIMIT 10
You'd want your primary key (clustered index) to be on key_word
.
Comparison to nonclustered indexes
All indexes are physically stored in order (a btree actually, but basically), so if you are returning just the column that is stored in the index, you're still getting the same benefit. That is because the indexed column's actual value is stored in the index, therefore MySQL will use the index value instead of reading the record. However, if you start retrieving columns that aren't part of the index, this is where you'd also want the actual records stored in order, such as they are with a clustered index.
MySQL Documentation on clustered indexes
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
In InnoDB, the records in nonclustered indexes (also called secondary indexes) contain the primary key columns for the row that are not in the secondary index. InnoDB uses this primary key value to search for the row in the clustered index. If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
The best example I can think of is a reporting table that is queried regularly on date of transaction(s). I would put a clustered index on the TransactionDate column and add any other required indexes based on query optimization.
So queries like select sum (amount) from transactiondetails where TransactionDate > 'jan 01 2010' and TransactionDate < 'feb 01 2010'
will use the clustered index to do seeks and will come up with results in a more efficient way.
A real address book (a dead tree edition), ordered by first name, resembles a clustered index in its structure and purpose.
Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items are selected.
Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required.
With a clustered index the rows are stored physically on the disk in the same order as the index. There can therefore be only one clustered index.
See the origin answer in Stackoverflow