I have a question about SQL Server indexes. I'm not a DBA and assume the answer is clear for those of you that are. I am using SQL Server 2008.
I have a table that is similar to the following (but has more columns):
CREATE TABLE [dbo].[Results](
[ResultID] [int] IDENTITY(1,1) NOT NULL,
[TypeID] [int] NOT NULL,
[ItemID] [int] NOT NULL,
[QueryTime] [datetime] NOT NULL,
[ResultTypeID] [int] NOT NULL,
[QueryDay] AS (datepart(day,[querytime])) PERSISTED,
[QueryMonth] AS (datepart(month,[querytime])) PERSISTED,
[QueryYear] AS (datepart(year,[querytime])) PERSISTED,
CONSTRAINT [PK_Results] PRIMARY KEY CLUSTERED
(
[ResultID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
The important fields to notice here are ResultID, the primary key, and QueryTime the datetime at which the result was produced.
I also have the following index (amongst others):
CREATE NONCLUSTERED INDEX [IDX_ResultDate] ON [dbo].[Results]
(
[QueryTime] ASC
)
INCLUDE ( [ResultID],
[ItemID],
[TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
In a database where I have about a million rows in the table, the index is used when doing a query such as:
select top 1 * from results where querytime>'2009-05-01' order by ResultID asc
In another instance of the same database, with 50 million rows, SQL Server decides not to use the index as it rather does a Clustered Index Scan which ends up being horribly slow. (and speed depends on the date). Even if I use query hints to make it use IDX_ResultDate, it is still a bit slow and it spends 94% of its time sorting by ResultID. I figured that by creating an index with both ResultID and QueryTime as sorted columns in the index, I could speed up my query.
I therefore created the following:
CREATE NONCLUSTERED INDEX [IDX_ResultDate2] ON [dbo].[Results]
(
[QueryTime] ASC,
[ResultID] ASC
)
INCLUDE ( [ItemID],
[TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
I assumed that it would first use the sort by QueryTime to find the matching results, which would already be sorted by ResultID. However, this is not the case as this index changes nothing in performance over the existing one.
I then tried the following index:
CREATE NONCLUSTERED INDEX [IDX_ResultDate3] ON [dbo].[Results]
(
[ResultID] ASC,
[QueryTime] ASC
)
INCLUDE ( [ItemID],
[TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
This one produces the intended result. It appears to return in constant time (a fraction of a second).
However, I am puzzled at why IDX_ResultDate3 works well whereas IDX_ResultDate2 doesn't.
I would assume that a binary search in as sorted list of QueryTime followed by peeking at the first result in it's child list of ResultIDs is the fastest way at getting the result. (Hence my initial sort order).
Side question: Should I create a persisted column with the date portion of QueryTime and index on that instead (I already have three persisted columns as you can see above)?