views:

109

answers:

6

I've seen in the post called something like "a small change you've done that has increased the performance of your application" a comment about changing from:

SELECT U.userid,groups_in=(
    SELECT COUNT(*) 
    FROM usersgroup 
    WHERE userid=U.userid) 
FROM tbl_users U

to:

SELECT U.userid, groups_in 
FROM users U 
LEFT JOIN (
    select userid, groups_in=count(*) 
    from usersgroup 
    group by userid) GROUPS 
    ON GROUPS.userid = U.userid

And I thought "oh, that's the kind of thing I've been doing wrong!!", however I tried both queries in the same environment and both gives me the same execution time and the db execution plan looks exactly the same.

Is there a better way to do the same operation? are those queries both absolutely fine?

+2  A: 

The SQL Server optimiser seems to get better and better with each new version, service pack and hotfix. I can't count the number of times I have seen it execute [horrible convoluted mess] using the same efficient plan as [simple elegant equivalent].

Look to your table and index design for efficiency savings first, then clean up your queries if they are still running slowly.

Christian Hayter
+1  A: 

Try:

SELECT U.userid, COUNT(G.userid) as groups_in
FROM users U LEFT JOIN usersgroup G ON G.userid = U.userid
GROUP BY U.userid;

This avoid subqueries -- which is very bad for the optimizer.

Make sure you have index on the "userid" column on both table

J-16 SDiZ
This would give 1 for groups_in when no usergroups are found.
Andomar
@Andomar: Thanks. I have changed it to COUNT(G.userid), to ignore NULL G
J-16 SDiZ
yes, this would be better, I should've probably written a different example where the count can't be achieved using GROUP BY
tricat
A: 

do u really need to use count(*)?

You can improve performance drastically if you name the column in leiu of the asterix, or use count(1)

and I would usually avoid select in a select

waqasahmed
Bear in mind that if you use a specific column that has null values in it, the null values will not be included, leading to a potentially different count.
adrianbanks
A: 

An alternative method is this, which sums the number of rows that are not null.

select 
    u.userId
,   sum(case when ug.userId is not null then 1 else 0 end) 'groups_in'
from
    users u
    left join usergroups ug on u.userId = ug.userId
group by
    u.userId
Jeff Meatball Yang
A: 

This seems the most natural way to write it:

SELECT U.userid, COUNT(g.userid) as groups_in
FROM users U 
LEFT JOIN usersgroup G ON G.userid = U.userid
GROUP BY U.userid

COUNT(*) would return 1 even for users without a usergroup. COUNT(g.userid) returns 0 if no usergroup is found.

Andomar
A: 

One thing to take into consideration, is that the SQL Server query optimizer is cost based. In other words it will inspect your query, index strategies, statistics and other factors to create a query plan before executing the query. You need a representative set of data to test your query against.

Kim Major