views:

980

answers:

6

SQL Server 2005

I have 10 million rows in DB, and run a select (with lots of "where" and joints.. pretty complex). The results are presented in grid (think goolge results) and because of that, the user cannot possibly use more then 1000 results.

So I limit my SQL with a TOP 1000.

Problem: User still wants to know that there are 5432 results for his search.

Can I get that info without paying the price (or in other words, with still getting the speed benefits the "top 1000" is giving me ?)

Assumptions - Assume that the TOP 1000 brings 1000 out of 100K rows. So even the network price of moving 100K might be an issue.

Conclusions There is no free lunch! you can get the elegant way (accepted answer) but it still takes as long as the more expensive operation (i.e. counting all results). In real life, I will go with the 2 SQL approach, one to return top 1000 rows for display, and one that is ASYNC and updates some AJAX panel with the count(*) results that will take much much longer to computer

+5  A: 

Personally I'd opt for two statements hitting the database. One to retrieve the count, one to retrieve the first 1000 records.

You could run both queries in a batch to squeeze a little extra performance by saving a round-trip to the database.

-- Get the count
select count(*) from table where [criteria]

-- Get the data
select [cols] from table where [criteria]
Neil Barnwell
You could also UNION those two if you want. Or have them in a stored proc.
Filip Ekberg
You can't union them unless they have the same number and types of columns, which isn't very likely.
Kibbee
This, but use count(PK), not count(*).
Greg Hurlman
Greg - that's a false optimization and a common misconception. There's no difference in performance. Count(anything) just counts rows, period, and the optimizer knows it.
Dave Markle
@Neil - this can be done in one simple statement in SQL 2005. No need for two queries.
Dave Markle
I will do this, but not in batch nor sequence. I will do the TOP 1000 for the results, and the count(*) as a separate ASYNC operation that when retunrs, updates a AJAX panel. making the results come faster...
csmba
@Dave Thanks for the tip - I'm assuming that's new for SQLS2005? Wonder what else I've missed...
Neil Barnwell
+3  A: 

You want to use "count" and group by, check out this reference: http://msdn.microsoft.com/en-us/library/ms175997.aspx

Also your problem seems to be posted here: http://www.eggheadcafe.com/software/aspnet/32427870/select-top-n-plus-a-count.aspx

Good Luck

Filip Ekberg
Why isn't this at the top?! This is the correct answer.
Dave Markle
Good Question, someone should press "Mark as Answer" ;)
Filip Ekberg
A: 

You may be over-estimating the performance benefit of the "top 1000", especially if there would only be ~5000 total results.

The server already has to do all the joins and stuff (which is typically the hard part), then has to order the result set the way you specified, THEN finally takes the 1000 first results.

Two options here:

1) Do one query with a Count(*) to get the count of results, then do a second query with your top 1000, retrieving the appropriate columns (as Neil suggests). OR
2) Retrieve all rows the first time, cache them in a result set, then only display 1000 rows to the user.

It might sound like the first would be faster, but the second only has to hit the database once, and depending on the specifics of your database and query, might be better (as long as the database isn't going to return 100,000 rows!)

BradC
without top, there might be 100K rows...
csmba
then you definitely don't want to cache them client-side!
BradC
A: 

Since you're using SQL Server 2005 you get to use a CTE for this kind of query. Here's what I'm currently doing for a client:

;WITH Search_Results AS
(
     SELECT TOP(@system_max_rows)
          my_column1,
          my_column2,
          ROW_NUMBER() OVER
          (
               ORDER BY
                    -- Your order criteria here
          ) AS row_num,
          COUNT(my_column1) OVER (PARTITION BY '') As total_count
     FROM
          My_Table
     -- Put any joins here
     WHERE
          -- Put WHERE criteria here
)
SELECT
     my_column1,
     my_column2,
     row_num,
     total_count
FROM
     Search_Results
WHERE
     ((row_num - 1)/@rows_per_page) + 1 = CASE
                WHEN ((total_count - 1)/@rows_per_page) + 1 < @page_number THEN ((total_count - 1)/@rows_per_page) + 1
                ELSE @page_number
           END
OPTION (RECOMPILE)

The recompile is there because the search criteria can change drastically between calls to the stored procedure making a cached query plan bad. Hopefully the parameters/variables are obvious. This was for a paged search solution. The system max rows variable is hard-coded so that even the application can't override the maximum number of rows that can be returned and crash the server. To get the top 1000 you would pass in @page_number = 1 and @rows_per_page = 1000.

Tom H.
A: 

This won't answer your questions, but I think this a good case where we implement the:

  • Hardware is cheap
  • Programmers are not cheap

Rule.

Yes doing two queries (one count, one select *) is not an optimal solution (you could do it in one query!). But how sure are you that this is going to be a bottleneck?

I get stuck on these questions and I have to remember this myself to continue developing.

Write it fast, write it better, PROFILE, then write it best.

Or, maybe this is the bottleneck and thats why you're asking. But it seems to me that if it was your bottleneck, a better optimization would to be restructure the database so that your query itself is simplified.

jskulski
don't be silly. I would not waste anyone's time. I AM SURE!It takes 10 min to return 100K rows from a DB of tens of millions of rows. It takes only few sec to get top 1000. So yes, this is a real problem and worth my programmers time, and no HW upgrade can fix it.
csmba
+7  A: 

This is dead simple in SQL 2005. Why so many long answers?

SELECT TOP 1000 x,y,z, COUNT(*) OVER () AS TotalCount
FROM dbo.table
Dave Markle
Very clever - thanks. More examples here: http://www.sqlteam.com/article/sql-sever-2005-using-over-with-aggregate-functions
Neil Barnwell