views:

416

answers:

1

I want to be able to take any arbitrary SELECT TOP(X) query that would normally return a large number of rows (without the X limit) and transform that query into a query that counts how many rows would have been returned without the TOP(X) (i.e. SELECT COUNT(*)). Remember I am asking about an arbitrary query with any number of joins, where clauses, group by's etc.

Is there a way to do this?


edited to show syntax with Shannon's solution:

i.e.

`SELECT TOP(X) [colnames] FROM [tables with joins] 
 WHERE [constraints] GROUP BY [cols] ORDER BY [cols]`

becomes

`SELECT COUNT(*) FROM 
 (SELECT [colnames] FROM [tables with joins] 
 WHERE [constraints] GROUP BY [cols]) t`
+3  A: 

Inline view:

select count(*)
from (...slightly transformed query...) t

... slightly transfomed query... is:

  1. If the select clause contains any columns without names, such as select ... avg(x) ... then do one of 1) Alias the column, such as avg(x) as AvgX, 2) Remove the column, but make sure at least one column is left, or my favorite 3) Just make the select clause select 1 as C
  2. Remove TOP from select clause.
  3. Remove order by clause.

EDIT 1 Fixed by adding aliases for the inline view and dealing with unnamed columns in select clause.

EDIT 2 But what about the performance? Doesn't this require the DB to run the big query that I wanted to avoid in the first place with TOP(X)?

Not necessarily. It may be the case for some queries that this count will do more work than the TOP(x) would. And it may be the case that for a particular query, you could make the equivelent count faster by making addional changes to remove work that is not needed for the final count. But those simplifications can not be included in a general method to take any arbitrary SELECT TOP(X) query that would normally return a large number of rows (without the X limit) and transform that query into a query that counts how many rows would have been returned without the TOP(X).

And in some cases, the query optimizer may optimize away stuff so that the DB is not to run the big query.

For example Test table & data, using SQL Server 2005:

create table t (PK int identity(1, 1) primary key,
  u int not null unique,
  string VARCHAR(2000))

insert into t (u, string)
select top 100000 row_number() over (order by s1.id) , replace(space(2000), ' ', 'x')
from sysobjects s1, 
    sysobjects s2, 
    sysobjects s3, 
    sysobjects s4, 
    sysobjects s5, 
    sysobjects s6, 
    sysobjects s7

The non-clustered index on column u will be much smaller than the clustered index on column PK.

Then set up SMSS to show the actual execution plan for:

select PK, U, String from t
select count(*) from t

The first select does a clusted index scan, because it needs to return data out of the leafs. The second query does an index scan on the smaller non-clusteed index created for the unique constraint on U.

Applying the transform of the first query we get:

select count(*)
from (select PK, U, String from t) t

Running that and looking at the plan, the index on U is used again, exact same plan as select count(*) from t. The leaves are not visited to find the values for String on every row.

Shannon Severance
but what about the GROUP BY problem mentioned in the question?
Carlos Rendon
GROUP BY can be on a query that is within FROM ( ... )
KM
mssql gives me a syntax error with this query on the last ')' any ideas?
Carlos Rendon
@Carlos Rendon, edit your question and show the code...
KM
@KM I added code to the post
Carlos Rendon
not sure if this is your problem, but sometimes the inner query wants an alias. Try select count(*) from (old query) t
Beth
@Beth, thanks! that was my problem
Carlos Rendon
But what about the performance? Doesn't this require the DB to run the big query that I wanted to avoid in the first place with TOP(X)
Carlos Rendon
@Beth, thank you for catching that.
Shannon Severance
@Shannon thanks for the comprehensive edit
Carlos Rendon