views:

364

answers:

7

My table (SQL Server 2008) has 1 million+ records, when I try to order records by datetime, it takes 1 second, but when I order by ID (int), it only takes about 0.1 second.

Is there any way to improve the efficiency? (I already added the datetime column to the index)

+6  A: 

Ordering by id probably uses a clustered index scan while ordering by datetime uses either sorting or index lookup.

Both these methods are more slow than a clustered index scan.

If your table is clustered by id, basically it means it is already sorted. The records are contained in a B+Tree which has a linked list linking the pages in id order. The engine should just traverse the linked list to get the records ordered by id.

If the ids were inserted in sequential order, this means that the physical order of the rows will match the logical order and the clustered index scan will be yet faster.

If you want your records to be ordered by datetime, there are two options:

  • Take all records from the table and sort them. Slowness is obvious.
  • Use the index on datetime. The index is stored in a separate space of the disk, this means the engine needs to shuttle between the index pages and table pages in a nested loop. It is more slow too.

To improve the ordering, you can create a separate covering index on datetime:

CREATE INDEX ix_mytable_datetime ON mytable (datetime) INCLUDE (field1, field2, …)

, and include all columns you use in your query into that index.

This index is like a shadow copy of your table but with data sorted in different order.

This will allow to get rid of the key lookups (since the index contains all data) which will make ordering by datetime as fast as that on id.

Update:

A fresh blog post on this problem:

Quassnoi
Is there any efficient way for doing this?
silent
I believe the database will have stored it in this format anyway (and likey compares it in this manner as well). I thought of the same thing at first too though but I don't believe that it should be the answer.
Jrud
+1 To optimize for the datetime sort, make the datetime the clustered index, and the id index a nonclustered primary key.
Andomar
A: 

maybe if you store datatime as a int but it would take time converting each time you store or get data. (common technique used to store staff like ip address and have a faster seek times)

you should check in your server how it stores datetime, b/c it your server already stores it as int or bigint.. it will not change anything....

Dani
+2  A: 

Add the date time to a new index, adding it to the id one will still not help much.

Mark Dickinson
A: 

If your datetime field contains a lot of distinct values and those values rarely change, define a clustered index on the datetime field, this will sort the actual data by the datetime value. See http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspx for using clustered indexes.

This will make you int searches slower though, as they will be relegated to using a non-clustered index.

badbod99
+1  A: 

Could it be that there is an index for your int column but not for your datetime column? Look at the execution plan.

Nestor
+1 good point - check the execution plan! Is the index being used at all??
marc_s
A: 

Have you added the DateTime field to "the" index or to an exclusive index? Are you filtering your selection by another field and the DateTime or only this one?

You must have an index with all the fields that you are filtering and preferably in the same order to optmize performance.

j.a.estevan
I had added it to an existing index, I just tried to create a new index, it gets relatively faster (0.5 seconds), but still slower than an int column.
silent
It is important to make a good index looking at the fields in the "WHERE" and "ORDER BY" statements (and "GROUP BY", if any). It should be the same fields and in the same order. If you don't need every column in the table, make the SELECT only with the desired fields. Then clear the cache and statistics and try the results. Also think that is dificult to select at the same velocity of the clustered index in tables with lots of data. This is the reason to select carefully which should be the clustered one.
j.a.estevan
+3  A: 

To honor the ORDER BY the engine has two alternatives:

  • scan the rows using an index that offers the order requested
  • sort the rows

First option is fast, second is slow. The problem is that in order to be used, the index has to be a covering index. Meaning it contains all the columns in the SELECT projection list and all the columns used in WHERE clauses (at a minimum). If the index is not covering then the engine would have to lookup the clustered index (ie the 'table') for each row, in order to retrieve the values of the needed columns. This constant lookup of values is expensive, and there is a tipping point when the engine will (rightfully) decide is more efficient to just scan the clustered index and sort the result, in effect ignoring your non-clustered index. For details, see The Tipping Point Query Answers.

Consider the following three queries:

SELECT dateColumn FROM table ORDER BY dateColumn
SELECT * FROM table ORDER BY dateColumn
SELECT someColumn FROM table ORDER BY dateColumn

The first one will be be using a non-clustered index on dateColumn. But a the second one will not be using an index on dateColumn, will likely choose a scan and sort instead for 1M rows. On the other hand the third query can benefit from an index on Table(dateColumn) INCLUDE (someColumn).

This topic is covered at large on MSDN see Index Design Basics , General Index Design Guidelines , Nonclustered Index Design Guidelines or How To: Optimize SQL Indexes.

Ultimately, the most important choice of your table design is the clustered index you use. Almost always the primary key (usually an auto incremented ID) is left as the clustered index, a decision that benefits only certain OLTP loads.

And finally, a rather obvious question: Why in the world would you order 1 million rows?? You can't possibly display them, can you? Explaining a little bit more about your use case might help us find a better answer for you.

Remus Rusanu
Thank you Remus, I don't want to display 1 million+ records in one time, I'm using row_number() method for paging, but I found when I try to browse a big number, like page 50000 (20 rec/page), the query becomes very slow, but if I change the ordering method to ID, it gets almost 10 times faster.
silent
I thought this must be row_number pagination. Is it LINQ, by any chance? Your best bet is to express the query as follows: select the IDs of the page you're interested (ed. the 20 IDs on page 17) then retrieve the details of those 20 records. This *can* be expressed in T-SQL and also in LINQ and can be fast enough. You better post the schema and queries involved.
Remus Rusanu