ansaurus

Question

select top 1000, but know how many rows are there?

Answer 1

+5 A:

Personally I'd opt for two statements hitting the database. One to retrieve the count, one to retrieve the first 1000 records.

You could run both queries in a batch to squeeze a little extra performance by saving a round-trip to the database.

-- Get the count
select count(*) from table where [criteria]

-- Get the data
select [cols] from table where [criteria]

Neil Barnwell 2008-12-22 17:26:03

You could also UNION those two if you want. Or have them in a stored proc.

Filip Ekberg 2008-12-22 17:43:17

You can't union them unless they have the same number and types of columns, which isn't very likely.

Kibbee 2008-12-22 18:18:56

This, but use count(PK), not count(*).

Greg Hurlman 2008-12-22 18:22:03

Greg - that's a false optimization and a common misconception. There's no difference in performance. Count(anything) just counts rows, period, and the optimizer knows it.

Dave Markle 2008-12-22 18:34:04

@Neil - this can be done in one simple statement in SQL 2005. No need for two queries.

Dave Markle 2008-12-22 18:34:58

I will do this, but not in batch nor sequence. I will do the TOP 1000 for the results, and the count(*) as a separate ASYNC operation that when retunrs, updates a AJAX panel. making the results come faster...

csmba 2008-12-23 00:26:20

@Dave Thanks for the tip - I'm assuming that's new for SQLS2005? Wonder what else I've missed...

Neil Barnwell 2008-12-23 17:29:58

Answer 2

+3 A:

You want to use "count" and group by, check out this reference: http://msdn.microsoft.com/en-us/library/ms175997.aspx

Also your problem seems to be posted here: http://www.eggheadcafe.com/software/aspnet/32427870/select-top-n-plus-a-count.aspx

Good Luck

Filip Ekberg 2008-12-22 17:26:49

Why isn't this at the top?! This is the correct answer.

Dave Markle 2008-12-22 18:18:51

Good Question, someone should press "Mark as Answer" ;)

Filip Ekberg 2008-12-22 18:48:22

Answer 3

A:

You may be over-estimating the performance benefit of the "top 1000", especially if there would only be ~5000 total results.

The server already has to do all the joins and stuff (which is typically the hard part), then has to order the result set the way you specified, THEN finally takes the 1000 first results.

Two options here:

1) Do one query with a Count(*) to get the count of results, then do a second query with your top 1000, retrieving the appropriate columns (as Neil suggests). OR
2) Retrieve all rows the first time, cache them in a result set, then only display 1000 rows to the user.

It might sound like the first would be faster, but the second only has to hit the database once, and depending on the specifics of your database and query, might be better (as long as the database isn't going to return 100,000 rows!)

BradC 2008-12-22 17:33:23

without top, there might be 100K rows...

csmba 2008-12-22 17:40:58

then you definitely don't want to cache them client-side!

BradC 2008-12-22 18:09:16

Answer 4

A:

Since you're using SQL Server 2005 you get to use a CTE for this kind of query. Here's what I'm currently doing for a client:

;WITH Search_Results AS
(
     SELECT TOP(@system_max_rows)
          my_column1,
          my_column2,
          ROW_NUMBER() OVER
          (
               ORDER BY
                    -- Your order criteria here
          ) AS row_num,
          COUNT(my_column1) OVER (PARTITION BY '') As total_count
     FROM
          My_Table
     -- Put any joins here
     WHERE
          -- Put WHERE criteria here
)
SELECT
     my_column1,
     my_column2,
     row_num,
     total_count
FROM
     Search_Results
WHERE
     ((row_num - 1)/@rows_per_page) + 1 = CASE
                WHEN ((total_count - 1)/@rows_per_page) + 1 < @page_number THEN ((total_count - 1)/@rows_per_page) + 1
                ELSE @page_number
           END
OPTION (RECOMPILE)

The recompile is there because the search criteria can change drastically between calls to the stored procedure making a cached query plan bad. Hopefully the parameters/variables are obvious. This was for a paged search solution. The system max rows variable is hard-coded so that even the application can't override the maximum number of rows that can be returned and crash the server. To get the top 1000 you would pass in @page_number = 1 and @rows_per_page = 1000.

Tom H. 2008-12-22 18:09:31

Answer 5

A:

This won't answer your questions, but I think this a good case where we implement the:

Hardware is cheap
Programmers are not cheap

Rule.

Yes doing two queries (one count, one select *) is not an optimal solution (you could do it in one query!). But how sure are you that this is going to be a bottleneck?

I get stuck on these questions and I have to remember this myself to continue developing.

Write it fast, write it better, PROFILE, then write it best.

Or, maybe this is the bottleneck and thats why you're asking. But it seems to me that if it was your bottleneck, a better optimization would to be restructure the database so that your query itself is simplified.

jskulski 2008-12-22 18:16:54

don't be silly. I would not waste anyone's time. I AM SURE!It takes 10 min to return 100K rows from a DB of tens of millions of rows. It takes only few sec to get top 1000. So yes, this is a real problem and worth my programmers time, and no HW upgrade can fix it.

csmba 2008-12-23 00:05:55

Answer 6

+7 A:

This is dead simple in SQL 2005. Why so many long answers?

SELECT TOP 1000 x,y,z, COUNT(*) OVER () AS TotalCount
FROM dbo.table

Dave Markle 2008-12-22 18:20:55

Very clever - thanks. More examples here: http://www.sqlteam.com/article/sql-sever-2005-using-over-with-aggregate-functions

Neil Barnwell 2008-12-23 17:32:13

ansaurus

tags:

views:

answers:

select top 1000, but know how many rows are there?

related questions