views:

42

answers:

2

Here's a table:

CREATE TABLE Meetings
(
  ID int PRIMRY KEY IDENTITY(1,1)
  StartDate DateTime NOT NULL,
  EndDate DateTime NULL,
  Field1 varchar(50),
  Field2 varchar(50),
  Field3 varchar(50),
  Field4 varchar(50)
)

There's several thousand rows. The data ranges can be varying sizes (from a couple days up to 50 years).

Here's a query:

DECLARE @ApplicableDate DateTime

SELECT ID, StartDate, EndDate, Field1, Field2, Field3, Field4
FROM Meetings
WHERE StartDate <= @ApplicableDate AND
  (EndDate is null || @ApplicableDate <= EndDate)

Since the date ranges can be large, a large portion of the table might be returned (20%-50% of the rows).

The query represents the rows I want in a simple way, but the performance is pretty bad. It does a clustered index scan, no matter what indexes I add. I've tried:

  • StartDate
  • StartDate, EndDate

How can I improve the performance of this query?


I've reviewed the answers for this question and this one too. Those solutions aren't helpful in my situation - I don't really want to muck with the business' data by creating a separate table of Dates to turn the query into an equality query (what happens when end date is modified, or null?), or by morphing the data to fit in a spatial index.

Still, I'm open to possible modifications to the data structure (particular if they do not add rows, and do not use strange data types)..

+1  A: 

If the query returns 20%-50% of the records, then a scan is many times the best option. If you have an index, you always have to find the data in the index,which then contains a record address in the table, and you then have to get the page containing this record from disk, risking that adjacent records in the index are spread all over the disk.

If you really need that many records and performance is bad, then maybe check the following:

  • Is the disk speed an issue?
  • Is it the network bandwidth?
  • Are you restricted in RAM/Cache?
Frank
+3  A: 

I'm assuming you're on SQL Server for my syntax.

Make your primary key on ID a non-clustered index.

ID int PRIMARY KEY NONCLUSTERED IDENTITY(1,1),

Create a clustered in index on the StartDate column.

CREATE CLUSTERED INDEX ix_Meetings_StartDate
ON Meetings (StartDate)

Try your query as is. Even though the data is probably stored similarly to what you had with the clustered PK, now the query engine will know in advance that the data is clustered by the start date.

bobs