ansaurus

Question

Recommended indexes for query in large table involving a 'date range' and an 'order id'

Answer 1

+1 A:

How many rows do you expect to have between a typical date range? Are you typically looking at a month at a time?

I would start out with an index over the [Date] column. If, for a typical query, your resulting row count is small you shouldn't need to add the [OrderId] column to your index.

On the other hand, if you have a large amount of rows in a typical month, then you can add the [OrderId] column to the index, though since it's being treated as a boolean value, it may not buy you much. It depends on how many rows are NULL vs NOT NULL. If you have you a lot of rows for a given month, but only a few have a valid [OrderId] then the index would probably improve performance.

Read the accepted answer in this related question and determine if it's worth indexing over the additional column:

http://stackoverflow.com/questions/231125/should-i-index-a-bit-field-in-sql-server

And of course, test the indexes and the plans generated with and without the index.

Update: Some of the other answers specify a more aggressive index, which should improve performance of this query, but may adversely affect other operations on the table. For example, the covering index suggested will allow SQL Server to process this query with little impact to the actual table, but may cause problems when other queries write to the actual table (since SQL Server will need to update both the table and covering index in that case).

Because this is a reporting query, I would optimize it as little as possible. If this query is running long, causing other, more critical, queries to run slowly or timeout, I would only optimize this query enough to reduce it's affect on those other queries.

Though, if you expect this table to continue growing I would consider a separate reporting schema and periodically extract data from this table.

Brannon 2008-11-23 08:34:48

Answer 2

A:

It's also worth considering if you need to store rows in the SiteVisit who don't have a KnownRefererId in your KnownReferer table, and who have a Null OrderId. If you don't need these, change delete them from the table, and change your clustered index to be on both SiteVisitId and Date and the query should be pretty quick.

But I'm sure your storing those extra rows for a reason.

Ben R 2008-11-23 09:17:27

Answer 3

A:

If you really want to optimize the bejesus out of this query and you can accept a slightly slower insert into the table, you would create an index on:-

(Date, OrderId, SiteVisitId, Domain, Referer, CampaignId)

This will let the database return an answer entirely from the index without any sorting or separate table access.

WW 2008-11-23 09:38:07

I wouldn't include that many coulmns in a index unless proven to be 100% necessary. If using SQL Server 2005, you would be better off INCLUD'ing the columns that remove a bookup lookup into the clustered index (also, Domain is in another table!)

Mitch Wheat 2008-11-23 09:48:51

Answer 4

+1 A:

I would create an index on Date and OrderId and INCLUDE columns SiteVisitId, Referer,CampaignId (assuming you are using SQL Server 2005 onwards). Also create an index on the Foreign key column KnownRefererId.

Given that this is a reporting query and could withstand the odd uncommited row, I would suggest using the NOLOCK (or READ UNCOMMITED hint):

using (var trans = new TransactionScope(TransactionScopeOption.Required,
                      new TransactionOptions
                      {
                          IsolationLevel = IsolationLevel.ReadUncommitted
                      }))
{
    // Put your linq to sql query here
}

Ref.

Caveat: Only use NOLOCK hints where you have a very good reason. In the past I have seen developers come to grief through blanket use!

Mitch Wheat 2008-11-23 09:47:06

+1 for the excellent index advice, -0.5 for suggesting NOLOCK; it'll just encourage using it all the time. Better to tune the indexing schemes to lower records read than even trying the UNCOMMITED route.

SqlACID 2008-11-23 15:50:33

@SqlACID: that's why I put "Caveat: Only use NOLOCK hints where you have a very good reason"!

Mitch Wheat 2008-11-24 00:49:03

reports can "withstand" uncommitted rows? HUH? That's like saying Science can get along fine with a handful of untrue facts. Seriously, SQL Server people have a totally different perception of "TRUTH". What if it's rolled back? You'll have a report that includes that record, and no record anywhere.

2008-12-02 21:31:48

Answer 5

A:

SELECT TOP 1000
  t0.SiteVisitId, t0.OrderId, t0.Date, 
  t1.Domain, t0.Referer, t0.CampaignId
FROM 
  SiteVisit AS t0
LEFT OUTER JOIN KnownReferer AS t1 ON t1.KnownRefererId = t0.KnownRefererId
WHERE
  t0.Date <= @p0 
  AND t0.Date >= @p1
  AND t0.OrderId IS NOT NULL
ORDER BY 
  t0.Date DESC

@p0='2008-11-1 23:59:59:000', @p1='2008-10-1 00:00:00:000'

I'm going to guess on table stats here, and the resulting design could slow down other queries - but that's generally the trade off. I usually find that when moving a clustered index, it's best to create a replacement index to avoid upsetting other queries too much.

On the assumption that there a lot of rows in the 1 month date range, and relatively few of those have OrderId IS NULL - you'd be best off with a clustered index on Date. That should give you a clustered index scan, with results nicleley ordered for your TOP 1000.

You may also want KnownReferer.KnownRefererId to be either the clustered index or in a combined index with knownRefererId + Domain to avoid a lookup into that table. I'd guess you're number of KnownReferers is small though - so I wouldn't expect much benefit from this.

Mark Brackett 2008-11-23 15:19:48

ansaurus

tags:

views:

answers:

Recommended indexes for query in large table involving a 'date range' and an 'order id'

related questions