tags:

views:

544

answers:

10

When joining to a subset of a table, any reason to prefer one of these formats over the other?

Subquery version:

SELECT ...
FROM Customers AS c
INNER JOIN (SELECT * FROM Classification WHERE CustomerType = 'Standard') AS cf
    ON c.TypeCode = cf.Code
INNER JOIN SalesReps s ON cf.SalesRepID = s.SalesRepID

vs the WHERE clause at the end:

SELECT ...
FROM Customers AS c
INNER JOIN Classification AS cf ON c.TypeCode = cf.Code
INNER JOIN SalesReps AS s ON cf.SalesRepID = s.SalesRepID
WHERE cf.CustomerType = 'Standard'

The WHERE clause at the end feels more "traditional", but the first is arguably more clear, especially as the joins get increasingly complex.

Only other reason I can think of to prefer the second is that the "SELECT *" on the first might be returning columns that aren't used later (In this case, I'd probably only need to return cf.Code and Cf.SalesRepID)

+3  A: 

The second clause is definitely more clear, and I suspect the optimizer will like it better too. And ideally you should specify the columns you need.

Otávio Décio
A: 

I always prefer second variant because if you use first and query becomes very complex there could be negative side as performance.

IordanTanev
A: 

I would say that in equivalent join/subquery situations the optimizer is going to produce similar execution plans and which route you use should be driven by what provides the most clarity of intent in the query. (e.g. choose based on maintainability)

cmsjr
A: 

I would always go with the second until I'm forced to use an alternative.

Keep joins in the FROM and conditions in the WHERE.

Robin Day
+3  A: 

What about a third option?

SELECT ...
FROM Customers AS c
INNER JOIN Classification AS cf 
    ON cf.CustomerType = 'Standard' 
    AND c.TypeCode = cf.Code
INNER JOIN SalesReps AS s 
    ON cf.SalesRepID = s.SalesRepID

Personally, I prefer to use JOIN syntax to indicate the statements on which the overall set is defined, the foreign keys or other conditions that indicate two rows should be joined to make a row in the result set.

The WHERE clause contains the criteria which filter my result set. Arguably, this can become quite bloaty and complicated when you are performing a number of joins, however when you think in sets it follows a kind of logic:

  • SELECT what columns I want.
  • JOIN tables to define the set I want to get rows from.
  • Filter out rows WHERE my criteria are not met.

By this logic, I'd always choose your second syntax for consistent readability.

Programming Hero
+2  A: 

run

SET SHOWPLN_ALL ON

and then each query.

I would think the first may run the same plan when in a simple query, but that the second would always run the same or better, especially in more complex queries.

KM
+2  A: 

I only use subqueries when there needs to be a distinct query--such as a group by, or something overly complex.

I'd also do a variant on the second query, like so:

SELECT ...
FROM Customers AS c
INNER JOIN Classification AS cf
 ON c.TypeCode = cf.Code
 AND cf.CustomerType = 'Standard'
INNER JOIN SalesReps AS s 
ON cf.SalesRepID = s.SalesRepID

This will cut out the "extra rows" during the joing part of the query. Might not make a difference to optimizers for this query, but it definitely will in others (outer joins, further subqueries, et. al.)

Philip Kelley
+1  A: 

As others said the 2nd is a nicer choice. But also consider the implications of the filter location IF you move to an outer join. If you perhaps want to see all customers and for those customers that are classified as "Standard" you want the sales rep info then review the SQL below.

    SELECT ...
      FROM Customers AS c
 LEFT JOIN Classification AS cf
        ON c.TypeCode      = cf.Code
       AND cf.CustomerType = 'Standard'
 LEFT JOIN SalesReps s 
        ON cf.SalesRepID   = s.SalesRepID

The code below would not produce the same results as above. It would have fewer rows and be incorrect.

    SELECT ...
      FROM Customers AS c
 LEFT JOIN Classification AS cf
        ON c.TypeCode      = cf.Code
 LEFT JOIN SalesReps s 
        ON cf.SalesRepID   = s.SalesRepID
     WHERE cf.CustomerType = 'Standard'

But specifically for your question I would want to see the following version. I believe the intent is clear in this version.

    SELECT ...
      FROM Customers AS c
      JOIN Classification AS cf
        ON c.TypeCode      = cf.Code
       AND cf.CustomerType = 'Standard'
      JOIN SalesReps s 
        ON cf.SalesRepID   = s.SalesRepID
esabine
+3  A: 

The first version is a derived table. Don't confuse it with a subquery.

I would check out the various versions as far as performance (and making sure they all provide the same results, you'd be surprised how often people forget in optimizing code that the same results are important!). I would suspect the first version was written to miminize the number of records being joined to as an attempt to improve performance (derived tables often improve performance over other constructions and it may have been replacing a correlated subquery which it almost certainly would have better performance than). Whether it did or not, I would have to run in your db to see.

Basically when two constructions have the same result, my preference is to chose the faster performing one. Yes it might be a bit harder to understand (you can always add comments explaining what you did and why to help maintainers). But performance is one of the three critical things that must be considered in all database access (security and data integrity are the other two). Performance should trump ease of maintenance in a database especially for queries that are run frequently. Avoiding an extra ten minutes to understand something once a year or so when you need to look at it (and most queries are revisited less than that) is not worth extra seconds for every user every time it is run especially when it is run thousands of times a day.

HLGEM
+1, a complex query that run faster than a simple version of that same query is the best for everyone: users have the results faster, less load on hardware, and developers get better.
KM
+1  A: 
SELECT ...
FROM Customers AS c
INNER JOIN (SELECT * FROM Classification WHERE CustomerType = 'Standard') AS cf
    ON c.TypeCode = cf.Code
INNER JOIN SalesReps s ON cf.SalesRepID = s.SalesRepID


SELECT ...
FROM Customers AS c
INNER JOIN Classification AS cf ON c.TypeCode = cf.Code
INNER JOIN SalesReps AS s ON cf.SalesRepID = s.SalesRepID
WHERE cf.CustomerType = 'Standard'

SQL Server will treat both queries the same.

These queries are identical performance-wise. You can freely interchange ON, WHERE and inline view conditions: the SQL Server's optimizer is smart enough to figure out the best plan.

The first query is more easily convertable into an OUTER JOIN whenever the need arises, however, in this case it may be better worded as this:

SELECT  ...
FROM    Customers AS c
INNER JOIN -- or OUTER JOIN
        Classification AS cf
ON      cf.Code = c.TypeCode 
        AND cf.CustomerType = 'Standard'
INNER JOIN -- or OUTER JOIN
        SalesReps AS s
ON      s.SalesRepID = cf.SalesRepID

When writing the queries, I try to write them so that the key nature is obvious from the query.

If there a single column key on cf.code, I'd use this:

SELECT  ...
FROM    Customers AS c
INNER JOIN
        Classification AS cf
ON      cf.Code = c.TypeCode
INNER JOIN
        SalesReps AS s
ON      s.SalesRepID = cf.SalesRepID
WHERE   cf.CustomerType = 'Standard'

If the key is cf (Code, CustomerType), then this one:

SELECT  ...
FROM    Customers AS c
INNER JOIN
        Classification AS cf
ON      cf.Code = c.TypeCode
        AND cf.CustomerType = 'Standard'
INNER JOIN
        SalesReps AS s
ON      s.SalesRepID = cf.SalesRepID

, and if the key is cf (CustomerType, Code), then this one:

SELECT  ...
FROM    Customers AS c
INNER JOIN
        (
        SELECT  *
        FROM    Classification
        WHERE   CustomerType = 'Standard'
        ) AS cf
ON      cf.Code = c.TypeCode
INNER JOIN
        SalesReps s
ON      s.SalesRepId = cf.SalesRepID

A little side note: in MySQL, inline views are much less efficient than the joins, so I'd won't use them in this case in MySQL.

This is not the case for SQL Server, though.

Quassnoi