views:

11

answers:

2

I have a table which keeps parent-child-relations between items. Those can be changed over time, and it is necessary to keep a complete history so that I can query how the relations were at any time.

The table is something like this (I removed some columns and the primary key etc. to reduce noise):

CREATE TABLE [tblRelation](
    [dtCreated] [datetime] NOT NULL,
    [uidNode] [uniqueidentifier] NOT NULL,
    [uidParentNode] [uniqueidentifier] NOT NULL
)

My query to get the relations at a specific time is like this (assume @dt is a datetime with the desired date):

SELECT *
  FROM (
    SELECT ROW_NUMBER() OVER (PARTITION BY r.uidNode ORDER BY r.dtCreated DESC) ix, r.*
      FROM [tblRelation] r
      WHERE (r.dtCreated < @dt)
    ) r
  WHERE r.ix = 1

This query works well. However, the performance is not yet as good as I would like. When looking at the execution plan, it basically boils down to a clustered index scan (36% of cost) and a sort (63% of cost).

What indexes should I use to make this query faster? Or is there a better way altogether to perform this query on this table?

+1  A: 

The ideal index for this query would be with key columns uidNode, dtCreated and included columns all remaining columns in the table to make the index covering as you are returning r.*. If the query will generally only be returning a relatively small number of rows (as seems likely due to the WHERE r.ix = 1 filter) it might not be worthwhile making the index covering though as the cost of the key lookups might not outweigh the negative effects of the large index on CUD statements.

Martin Smith
Thanks! So if I don't actually do `r.*` (which was to reduce noise as well) you're saying that I should include all rows returned here in the index, right? Would it make sense to create the index as clustered (and not include all columns then)? Inserts will be quite rare (maybe 1 insert in the same time 10000 queries are done), deletions and updates will never take place. And does it matter if I make the index on `dtCreated` ascending or descending?
Lucero
*you're saying that I should include all rows returned here in the index, right?* **All *columns* yes.** *Would it make sense to create the index as clustered (and not include all columns then)?* . **If you don't have a more obvious candidate for a clustered index then possibly. The ideal clustered index key is narrow, unique, stable (not updated often), and ever increasing**. *And does it matter if I make the index on dtCreated ascending or descending?* **Not for the query you have shown. It might make a difference if you have an `ORDER BY` on the results of the `SELECT` though.**
Martin Smith
Yeah, I meant columns, not rows, of course! If I were to cluster on this, it would be more or less narrow (guid+datetime), unique, stable, but not ever increasing. I'll think about it, thank you very much for the inputs.
Lucero
A: 

The window/rank functions on SQL Server 2005 are not that optimal sometimes (based on answers here). Apparently better in SQL Server 2008

Another alternative is something like this. I'd have a non-clustered index on (uidNode, dtCreated) INCLUDE any other columns required by SELECT. Subject to what Martin Smith said about lookups.

WITH MaxPerUid AS
(
    SELECT
       MAX(r.dtCreated) AS MAXdtCreated, r.uidNode
    FROM
       MaxPerUid
    WHERE
       r.dtCreated < @dt
    GROUP BY
       r.uidNode
)
SELECT
    ...
FROM
   MaxPerUid M
   JOIN
   MaxPerUid R ON M.uidNode = R.uidNode AND M.MAXdtCreated = R.dtCreated
gbn