views:

119

answers:

5

Is this SQL query statement:

SELECT p.id, p.[name], SUM(ps.sales_amount) AS GROSS_SALES
FROM products p
  LEFT OUTER JOIN product_sales ps ON p.id = ps.product_id
GROUP BY p.id, p.[name]

better than:

SELECT SUM([t2].[value]) AS [SalesAmount], [t2].[id] AS [ProductId], [t2].[name] AS [ProductName]
FROM (
    SELECT (
        SELECT SUM([t1].[sales_amount])
        FROM [dbo].[product_sales] AS [t1]
        WHERE [t1].[product_id] = [t0].[id]
        ) AS [value], [t0].[id], [t0].[name]
    FROM [dbo].[products] AS [t0]
    ) AS [t2]
GROUP BY [t2].[id], [t2].[name]

The second one is a result of a LINQ2SQL query. Still finding a way to re-write the LINQ expression...

What would be the most optimized SQL query in the example?

Your thoughts? Thanks!

+1  A: 

Well, the first query is better because it is more readable. However, since the second query was generated than that explains it and since you won't (shouldn't) have to deal with the actual SQL generated from the LINQ expression there probably isn't a problem with the second one.

I would assume that the SQL Server engine can flatten that out into a standard join like the first query. If you want to know fire up SQL Server Management Studio and look at the actual execution plan of both.

tster
+1  A: 

As a general statement this seems like a fairly abusive subquery, and on that basis I'd predict that the first query will be faster. Why not try them and see?

Mark E
+1  A: 

That first one is more efficient. In general joins are better than subqueries. MS-sql does not always optimize subqueries to their logical "join" equivalents. The query is so small though that either should be reasonably good, and it will probably make no difference. You'll need to look at the query plan to see the difference, or look at both being executed using a trace.

The first is also of course more readable. This is what you will tend to get with Linq-to-SQL these days. Of course, the whole point is that you shouldn't have to deal with the sql yourself.

The story is that the queries generated by Linq-to-Sql will get more and more efficient.

Patrick Karcher
Did you actually check to see what SQL Server did with the second query?
tster
You can't really say that without knowing how the query-rewriter and optimizer in this case work
codekaizen
Patrick Karcher
+1  A: 

My hunch is that both should be the same, performance-wise. The left join is probably going to be a nested loop join, which is exactly what the subquery version will do.

But don't take my word for it - as everyone else is suggesting, check out the execution plan and see for yourself which one is better. Or turn on client statistics and see which one takes longer.

Aaronaught
+1  A: 

The real answer is that you can't tell without measuring. Even then you don't really know unless you measure on a realistic dataset.

Check out this blog post (not by me) where the generated sql looked worse but performed much better. His example specifically deals with subselects vs joins so I think it is very relevant to your situation.

Jere.Jones
The `LEFT JOIN/IS NULL` mistake is a common one. It prevents SQL Server from performing an anti semi join and almost never performs as well as people think it does. It's good to keep in mind but doesn't really apply here, as this is not an anti join.
Aaronaught
True. This isn't about an anti-semi-join, but what I took from the blog post was that sometimes the optimizer and/or LINQ is smarter than I am. :) I think that could apply here.
Jere.Jones