views:

277

answers:

2

I get a bizarre execution plan behavior from SQL Server (2005).

Table:

TableName: LOG
ID int,
Name varchar(50)

(Table contains around 1000 rows)

Query:

Select * FROM
(
 SELECT ROW_NUMBER() OVER (ORDER BY ID DESC) as Row,
 ID, Name
 FROM Log
)
as LogWithRowNumbers
WHERE Row>=1 and Row<=2

It estimates the number of rows returned as 9 (Although it's an obvious 2 or less). Also, removing the "and Row<=2" will increase the execution time by about * 5.

("and Row<=2" and "and Row<=9999999999999" behave the same)

I've updated statistics. But still, this behavior is strange. Adding Row<99999999999 will make the query run faster? why ?

+1  A: 

I'm no expert on the inner workings/query optimisation process of SQL Server, but here's my 2 pence (or cents if you prefer).

I believe it's due to the use of the ROW_NUMBER() value within the WHERE clause. As an example, I created a sample table, populated with 1000 rows from ID 1 to 1000 (ID as a primary key) like you said.

If you take out the ROW_NUMBER(), and do the query based on the ID column like this:

Select * FROM
(
 SELECT ID, Name
 FROM Log
)
as LogWithRowNumbers
WHERE ID>=1 and ID<=2

Then it correctly shows row count as being 2 - as expected really.

Now, working backwards, add in the ROW_NUMBER to the inner SELECT but leave the WHERE clause as-is like below:

Select * FROM
(
 SELECT ROW_NUMBER() OVER (ORDER BY ID DESC) AS RowNo,
 ID, Name
 FROM Log
)
as LogWithRowNumbers
WHERE ID>=1 AND ID <=2

This still shows the correct row count as 2.

Finally, set the WHERE clause back to use the RowNo as the column being filtered on instead of ID, this is when the estimated row count jumps to 9.

Hence, I believe it's the use of the ROW_NUMBER() function, being filtered on in the WHERE clause that is the cause. So I'd imagine it's because there are obviously better/more accurate statistics available on actual table columns than there are on this function-produced value.

I hope this at least provides a good starting point, hopefully will be useful!

AdaTheDev
+1  A: 

AdaTheDev is correct. The reason you see this behaviour is because SQL Server has to work out the row numbers for the table before it can use them in the where clause.

This should to be a more efficient way of getting the same results:

 SELECT TOP(2) ROW_NUMBER() OVER (ORDER BY ID DESC) as Row,
 ID, Name
 FROM Log
 ORDER BY ID DESC
AndrewWithey
I get what you're saying, only it's an example of a query I make. I usually retrieve rows 1-10, 10-20 etc. Problem is, I have to do it without the "Row<999999" and that costs more
Faruz