tags:

views:

491

answers:

4

What is the best index or ideas to improve date range lookups? (or query change)

    begin tran t1
      delete from h1 where histdate between @d1 and @d2 and HistRecordType = 'L'
      insert into h1 
             select * from v_l WHERE HistDate between @d1 and @d2 
     commit tran t1

it is far slower than histdate = @d1

I have a clustered, non-unique index on the date column

however, the perf is the same switching to a non-clustered

if @d1 = @d2.. the query takes 8mins to run, histdate=@d1 runs in 1 second (so that should sort of be equiv right?)

+3  A: 

A clustered index is the best index for between queries.

Are you doing anything else in the WHERE part, then there may be ways to improve the query.

A quick why; a clustered index will sort the rows in the table (that is why you can only have one clustered index per table). So SQL server first needs to find the first row to return (@d1) then all the rows are stored in order, and are fast to retreive. The histdate = @d1 is quicker, as all it needs to do then is find the first row, it doesn't have to continue finding all the other rows until @d2.

Thies
+1  A: 

A non-clustered index on the column will yield the same performance as the clustered key, as long as the non-clustered index contains all the fields (in the index or as INCLUDE columns) for any SELECT statement against it (a so-called "covering index").

The only reason a clustered index would be faster for range queries in many cases is the fact that the clustered index IS the table data, so any column you might need is right there in the clustered index. With a covering index, you achieve the same result - if the index plus any INCLUDE columns contains all the columns your SELECT statement needs to retrieve, you can satisfy the query just by looking at the index - there's no need to jump from the non-clustered index into the actual table to fetch more data ("bookmark lookup", which tends to be slow for lots of lookups).

For a DELETE, I don't think it makes any difference, really, as long as you just have an index on that column used in the WHERE clause (histdate).

Marc

marc_s
This is known as "convering" index.
RolandTumble
You would have to make sure that all of your queries are covered though, which could cost a lot of disk space and could also slow down inserts and updates.
Tom H.
Tom H: sure - if you have lots of different queries that need to be covered, you'll end up multiplying your data load. But if you have multiple queries, you probably also run into a problem trying to satisfy them all by a clustering index - you only get one per table...
marc_s
Tom H: I'm just saying there's nothing inherently faster for range queries when comparing clustered index vs. covering non-clustered index. In both cases, you have all the data you need. You only get one clustered index per table, but if need be, you can have multiple covering non-clustered indices for improving performance. It's a classic space-vs-speed trade-off.
marc_s
yep, the perf is the same when I just tried a non-clustered index
Scott Kramer
I'd gladly hog a little space for some more perf!!
Scott Kramer
+1  A: 

DateTime is stored as a float internally, so your current example should be pretty efficient. One potential optimization would be have another column like DayOffset that is calculated along the lines of DATEDIFF(day, 0, histDate). This would go as the first column in the clustered key.

Then if you delete entire days at a time, you could just delete based on the DayOffset. If you do not want to delete on midnight boundaries you could delete based on the pair of DayOffset and date range.

WHERE DayOffset between DATEDIFF(day, 0, @d1) and DATEDIFF(day, 0, @d2)
  and histdate between @d1 and @d2

Another possible option is to use partitioning. It is far far more efficient to age out data by dropping an old partition than it is to delete rows.

cool idea, I have the freedom to gen this hist table from scratch, so i'll try this... (it's about 50 million rows (but depends on the size of our customer), generated from invoice data)
Scott Kramer
A: 

try each of these with "SET SHOWPLAN_ON ALL"

delete from h1 where  histdate  between CONVERT(datetime,'1/1/2009') and CONVERT(datetime,'5/30/2009')

delete from h1 where  histdate =CONVERT(datetime,'1/1/2009')

delete from h1 where  histdate>= CONVERT(datetime,'1/1/2009') and histdate<CONVERT(datetime,'6/01/2009')

I'll bet the execute plans are all about the same, and use the index. The difference may be the number of rows in the range is greater than a exact "=" match

KM