views:

361

answers:

6

I have a table with 15 columns and 6.5 MILLION records. I need to access this table from the C# side with help of paging. I have written an SP but it takes about 1.30 mins to retrieve the data. here's my Stored Proc -

Create Proc demo
(
@startRowIndex int,
@maximumRows int
)
AS

DECLARE @first_id int, @startRow int

SET @startRowIndex =  (@startRowIndex - 1)  * @maximumRows

IF @startRowIndex = 0 
SET @startRowIndex = 1

SET ROWCOUNT @startRowIndex

SELECT @first_id = RecordID FROM edd_business_listings_05282009 ORDER BY RecordID

PRINT @first_id

SET ROWCOUNT @maximumRows

SELECT * FROM edd_business_listings_05282009 WHERE 
RecordID >= @first_id 
ORDER BY RecordID

SET ROWCOUNT 0

Does anyone knows a way of making this run faster.

+2  A: 

Can your application send in the last RecordID?

Make the front-end work harder.

Create Proc demo ( @startRowID int, @maximumRows int ) AS

SET ROWCOUNT @maximumRows

SELECT * FROM edd_business_listings_05282009 WHERE RecordID > @startRowID ORDER BY RecordID

SET ROWCOUNT 0

brian chandley
@jfar: SET ROWCOUNT @maximumRows ensures it will return all 20 subsequent rows, if they exist. No?
Peter
right, SET ROWCOUNT would not be affected by missing records.
brian chandley
A: 

If you use SQL Server 2005, then you can try

SELECT  field1, field2, fieldN
FROM     (SELECT ROW_NUMBER() OVER (ORDER BY RecordID) AS Row,
             field1, field2, fieldN FROM edd_business_listings_05282009)
            AS ListingsWithRowNumbers
WHERE  Row >= @startRowIndex AND Row <= @startRowIndex + @maximumRows

But anyway, try to rethink this architecture. What use is to display millions of records (even paged) in UI? You could try to limit the number of records and query only a subset initially...

Michał Chaniewski
This won't work past the first page of records, once @startRowIndex > @maximumRows.
Peter
oops.copied wrong version. I'm removing TOP clause.
Michał Chaniewski
A: 

Try putting an Index on the RecordId column. What I think is happening is your doing an entire table scan before the rowcount is in place so Sql can order everything. If you do already have an index than something else is the problem. I've done this same query on tables with twice the number of records and my execution time never went above 2 seconds.

Using ROWCOUNT or Row_Number() should technically accomplish the same thing performance wise but I'd use Row_Number() as it is a more modern way of doing this and setting rowcount comes with a lot more complexities than Row_Number() that I won't get into.

jfar
+1  A: 

Try using ROW_NUMBER in SQL 2005: http://www.4guysfromrolla.com/webtech/010406-1.shtml

Procedure such as this would help:

CREATE PROCEDURE dbo.GetListingPaged
(
    @StartRowIndex int,
    @MaximumRows int
)
AS
    SELECT
     RecordID,
     Field2 -- Not *
    FROM
    (
 SELECT
     RecordID,
     Field2 -- Not *
     ROW_NUMBER() OVER (ORDER BY RecordID) AS RowRank
    FROM edd_business_listings_05282009
    ) AS ListingWithRowNumbers
    WHERE
        RowRank > @StartRowIndex
    AND
        RowRank <= (@StartRowIndex + @MaximumRows)

GO
Junto
I'd like to see response time vs paging depth with this approach.
Peter
+1  A: 

OK, sure, here's my guess too:

Create Proc demo ( @startRowIndex int, @maximumRows int ) AS
DECLARE @first_id int, @startRow int
SET @startRowIndex = (@startRowIndex - 1) * @maximumRows

IF @startRowIndex = 0 SET @startRowIndex = 1

SELECT TOP (@maximuRows)
 {'all columns except N'}
 FROM (
 Select *, ROW_NUMBER() Over(Order by RecordID) as N
  from edd_business_listings_05282009
 ) As t
 WHERE N >= @startRowIndex
 ORDER BY RecordID
RBarryYoung
+1  A: 

The best solution is going to depend heavily on

1.how often the data changes

2.how often the sproc is called and how deep a user will typicall page and

3.how much latency (if any) you can accept in the ordering being up-to-date.

Often in a website backend your users only use the first few pages, but google bot can slam your deep pages and therefore slam your cpu. It is usually fine to support this live ordering only up to a certain point (e.g. use row_number() for first few hundred or thousand rows) and then switch to a denormalized list of ordering that is refreshed over some interval (perhaps hourly).