ansaurus

Question

Transform arbitrary SQL SELECT TOP(x) to a SELECT COUNT(*)?

Answer 1

+3 A:

Inline view:

select count(*)
from (...slightly transformed query...) t

... slightly transfomed query... is:

If the select clause contains any columns without names, such as select ... avg(x) ... then do one of 1) Alias the column, such as avg(x) as AvgX, 2) Remove the column, but make sure at least one column is left, or my favorite 3) Just make the select clause select 1 as C
Remove TOP from select clause.
Remove order by clause.

EDIT 1 Fixed by adding aliases for the inline view and dealing with unnamed columns in select clause.

EDIT 2 But what about the performance? Doesn't this require the DB to run the big query that I wanted to avoid in the first place with TOP(X)?

Not necessarily. It may be the case for some queries that this count will do more work than the TOP(x) would. And it may be the case that for a particular query, you could make the equivelent count faster by making addional changes to remove work that is not needed for the final count. But those simplifications can not be included in a general method to take any arbitrary SELECT TOP(X) query that would normally return a large number of rows (without the X limit) and transform that query into a query that counts how many rows would have been returned without the TOP(X).

And in some cases, the query optimizer may optimize away stuff so that the DB is not to run the big query.

For example Test table & data, using SQL Server 2005:

create table t (PK int identity(1, 1) primary key,
  u int not null unique,
  string VARCHAR(2000))

insert into t (u, string)
select top 100000 row_number() over (order by s1.id) , replace(space(2000), ' ', 'x')
from sysobjects s1, 
    sysobjects s2, 
    sysobjects s3, 
    sysobjects s4, 
    sysobjects s5, 
    sysobjects s6, 
    sysobjects s7

The non-clustered index on column u will be much smaller than the clustered index on column PK.

Then set up SMSS to show the actual execution plan for:

select PK, U, String from t
select count(*) from t

The first select does a clusted index scan, because it needs to return data out of the leafs. The second query does an index scan on the smaller non-clusteed index created for the unique constraint on U.

Applying the transform of the first query we get:

select count(*)
from (select PK, U, String from t) t

Running that and looking at the plan, the index on U is used again, exact same plan as select count(*) from t. The leaves are not visited to find the values for String on every row.

Shannon Severance 2009-08-07 21:19:25

but what about the GROUP BY problem mentioned in the question?

Carlos Rendon 2009-08-07 21:22:21

GROUP BY can be on a query that is within FROM ( ... )

KM 2009-08-07 21:24:37

mssql gives me a syntax error with this query on the last ')' any ideas?

Carlos Rendon 2009-08-07 21:35:48

@Carlos Rendon, edit your question and show the code...

KM 2009-08-07 21:45:12

@KM I added code to the post

Carlos Rendon 2009-08-07 21:53:33

not sure if this is your problem, but sometimes the inner query wants an alias. Try select count(*) from (old query) t

Beth 2009-08-07 22:10:38

@Beth, thanks! that was my problem

Carlos Rendon 2009-08-07 22:24:07

But what about the performance? Doesn't this require the DB to run the big query that I wanted to avoid in the first place with TOP(X)

Carlos Rendon 2009-08-07 22:30:59

@Beth, thank you for catching that.

Shannon Severance 2009-08-08 02:15:57

@Shannon thanks for the comprehensive edit

Carlos Rendon 2009-08-09 16:55:23

ansaurus

tags:

views:

answers:

Transform arbitrary SQL SELECT TOP(x) to a SELECT COUNT(*)?

related questions