views:

671

answers:

2

EDIT: I'm still waiting for more answers. Thanks!

In SQL 2000 days, I used to use temp table method where you create a temp table with new identity column and primary key then select where identity column between A and B.

When SQL 2005 came along I found out about Row_Number() and I've been using it ever since...

But now, I found a serious performance issue with Row_Number(). It performs very well when you are working with not-so-gigantic result sets and sorting over an identity column. However, it performs very poorly when you are working with large result sets like over 10,000 records and sorting it over non-identity column. Row_Number() performs poorly even if you sort by an identity column if the result set is over 250,000 records. For me, it came to a point where it throws an error, "command timeout!"

What do you use to do paginate a large result set on SQL 2005? Is temp table method still better in this case? I'm not sure if this method using temp table with SET ROWCOUNT will perform better... But some say there is an issue of giving wrong row number if you have multi-column primary key.

In my case, I need to be able to sort the result set by a date type column... for my production web app.

Let me know what you use for high-performing pagination in SQL 2005. And I'd also like to know a smart way of creating indexes. I'm suspecting choosing right primary keys and/or indexes (clustered/non-clustered) will play a big role here.

Thanks in advance.

P.S. Does anyone know what stackoverflow uses?

EDIT: Mine looks something like...

SELECT postID, postTitle, postDate
FROM
   (SELECT postID, postTitle, postDate, 
         ROW_NUMBER() OVER(ORDER BY postDate DESC, postID DESC) as RowNum
    FROM MyTable
   ) as DerivedMyTable
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

postID: Int, Identity (auto-increment), Primary key

postDate: DateTime

EDIT: Is everyone using Row_Number()?

+6  A: 

The row_number() technique should be quick. I have seen good results for 100,000 rows.

Are you using row_number() similiar to the following:

SELECT column_list
FROM
   (SELECT column_list
         ROW_NUMBER() OVER(ORDER BY OrderByColumnName) as RowNum
    FROM MyTable m
   ) as DerivedTableName
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

...and do you have a covering index for the column_list and/or an index on the 'OrderByColumnName' column?

Mitch Wheat
Thanks for your input. I updated my question.
Brian Kim
A: 

Well, for your sample query ROW_COUNT should be pretty fast with thousands of rows, provided you have an index on your PostDate field. If you don't, the server needs to perform a complete clustered index scan on your PK, practically load every page, fetch your PostDate field, sort by it, determine the rows to extract for the result set and again fetch those rows. It's kind of creating a temp index over and over again (you might see an table/index spool in the plain).

No wonder you get timeouts.

My suggestion: set an index on PostDate DESC, this is what ROW_NUMBER will go over - (ORDER BY PostDate DESC, ...)

As for the article you are referring to - I've done pretty much paging and stuff with SQL Server 2000 in the past without ROW_COUNT and the approach used in the article is the most efficient one. It does not work in all circumstances (you need unique or almost unique values). An overview of some other methods is here.

.

liggett78