tags:

views:

4576

answers:

7

I have two tables containing Tasks and Notes, and want to retrieve a list of tasks with the number of associated notes for each one. These two queries do the job:

select t.TaskId,
       (select count(n.TaskNoteId) from TaskNote n where n.TaskId = t.TaskId) 'Notes'
from   Task t

-- or
select t.TaskId,
       count(n.TaskNoteId) 'Notes'
from   Task t
left join
       TaskNote n
on     t.TaskId = n.TaskId
group by t.TaskId

Is there a difference between them and should I be using one over the other, or are they just two ways of doing the same job? Thanks.

A: 

You can use either, and they are semantically identical. In general, the rule of thumb is to use whichever form is easier for you to read, unless performance is an issue.

If performance is an issue, then experiment with rewriting the query using the other form. Sometimes the optimizer will use an index for one form, and not the other.

Sean Reilly
A: 

There's no clear-cut answer on this. You should view the SQL Plan. In terms of relational algebra, they are essentially equivalent.

ilitirit
+9  A: 

On small datasets they are wash when it comes to performance. When indexed, the LOJ is a little better.

I've found on large datasets that an inner join (an inner join will work too.) will outperform the subquery by a very large factor (sorry, no numbers).

Austin Salonen
ditto on the inner join - I've seen that make a big difference on large set (assuming that's the correct behavior you want, and you might need to use an ISNULL() or similar function)
Cade Roux
+3  A: 

In most cases, the optimizer will treat them the same.

I tend to prefer the second, because it has less nesting, which makes it easier to read and easier to maintain. I have started to use SQL Server's common table expressions to reduce nesting as well for the same reason.

In addition, the second syntax is more flexible if there are further aggregates which may be added in the future in addition to COUNT, like MIN(some_scalar), MAX(), AVG() etc.

Cade Roux
+2  A: 

If you're using SQL Server Management Studio, you can enter both versions into the Query Editor and then right-click and choose Display Estimated Execution Plan. It will give you two percentage costs relative to the batch. If they're expected to take the same time, they'll both show as 50% - in which case, choose whichever you prefer for other reasons (easier to read, easier to maintain, better fit with your coding standards etc). Otherwise, you can pick the one with the lower percentage cost relative to the batch.

You can use the same technique to look at changing any query to improve performance by comparing two versions that do the same thing.

Of course, because it's a cost relative to the batch, it doesn't mean that either query is as fast as it could be - it just tells you how they compare to each other, not to some notional optimum query to get the same results.

Simon Forrest
+4  A: 

The subquery will be slower as it is being executed for every row in the outer query. The join will be faster as it is done once. I believe that the query optimiser will not rewrite this query plan as it can't recognize the equivalence.

Normally you would do a join and group by for this sort of count. Correlated subqueries of the sort you show are mainly of interest if they have to do some grouping or more complex predicate on a table that is not participating in another join.

ConcernedOfTunbridgeWells
A: 

I make it a point to avoid subqueries wherever possible. The join will generally be more efficient.

HLGEM