ansaurus

Question

sql 2008 performance on date range lookup (BETWEEN)

Answer 1

+3 A:

A clustered index is the best index for between queries.

Are you doing anything else in the WHERE part, then there may be ways to improve the query.

A quick why; a clustered index will sort the rows in the table (that is why you can only have one clustered index per table). So SQL server first needs to find the first row to return (@d1) then all the rows are stored in order, and are fast to retreive. The histdate = @d1 is quicker, as all it needs to do then is find the first row, it doesn't have to continue finding all the other rows until @d2.

Thies 2009-06-04 20:17:49

Answer 2

+1 A:

A non-clustered index on the column will yield the same performance as the clustered key, as long as the non-clustered index contains all the fields (in the index or as INCLUDE columns) for any SELECT statement against it (a so-called "covering index").

The only reason a clustered index would be faster for range queries in many cases is the fact that the clustered index IS the table data, so any column you might need is right there in the clustered index. With a covering index, you achieve the same result - if the index plus any INCLUDE columns contains all the columns your SELECT statement needs to retrieve, you can satisfy the query just by looking at the index - there's no need to jump from the non-clustered index into the actual table to fetch more data ("bookmark lookup", which tends to be slow for lots of lookups).

For a DELETE, I don't think it makes any difference, really, as long as you just have an index on that column used in the WHERE clause (histdate).

Marc

marc_s 2009-06-04 20:21:11

This is known as "convering" index.

RolandTumble 2009-06-04 20:24:41

You would have to make sure that all of your queries are covered though, which could cost a lot of disk space and could also slow down inserts and updates.

Tom H. 2009-06-04 20:29:49

Tom H: sure - if you have lots of different queries that need to be covered, you'll end up multiplying your data load. But if you have multiple queries, you probably also run into a problem trying to satisfy them all by a clustering index - you only get one per table...

marc_s 2009-06-04 20:38:38

Tom H: I'm just saying there's nothing inherently faster for range queries when comparing clustered index vs. covering non-clustered index. In both cases, you have all the data you need. You only get one clustered index per table, but if need be, you can have multiple covering non-clustered indices for improving performance. It's a classic space-vs-speed trade-off.

marc_s 2009-06-04 20:54:05

yep, the perf is the same when I just tried a non-clustered index

Scott Kramer 2009-06-04 21:58:35

I'd gladly hog a little space for some more perf!!

Scott Kramer 2009-06-04 21:59:45

Answer 3

+1 A:

DateTime is stored as a float internally, so your current example should be pretty efficient. One potential optimization would be have another column like DayOffset that is calculated along the lines of DATEDIFF(day, 0, histDate). This would go as the first column in the clustered key.

Then if you delete entire days at a time, you could just delete based on the DayOffset. If you do not want to delete on midnight boundaries you could delete based on the pair of DayOffset and date range.

WHERE DayOffset between DATEDIFF(day, 0, @d1) and DATEDIFF(day, 0, @d2)
  and histdate between @d1 and @d2

Another possible option is to use partitioning. It is far far more efficient to age out data by dropping an old partition than it is to delete rows.

2009-06-04 20:29:55

cool idea, I have the freedom to gen this hist table from scratch, so i'll try this... (it's about 50 million rows (but depends on the size of our customer), generated from invoice data)

Scott Kramer 2009-06-04 22:16:20

Answer 4

A:

try each of these with "SET SHOWPLAN_ON ALL"

delete from h1 where  histdate  between CONVERT(datetime,'1/1/2009') and CONVERT(datetime,'5/30/2009')

delete from h1 where  histdate =CONVERT(datetime,'1/1/2009')

delete from h1 where  histdate>= CONVERT(datetime,'1/1/2009') and histdate<CONVERT(datetime,'6/01/2009')

I'll bet the execute plans are all about the same, and use the index. The difference may be the number of rows in the range is greater than a exact "=" match

KM 2009-06-04 20:33:58

ansaurus

tags:

views:

answers:

sql 2008 performance on date range lookup (BETWEEN)

related questions