views:

106

answers:

6

What's the difference between these two queries? I've been resistant to jumping on the ANSI-syntax bandwagon because I have not been able to unravel various syntactical ambiguities.

Is 1) returning the product of the join and only then filtering out those joined records which have weight >= 500? And is 2) filtering out those prior to the join?

Is 2 bad syntax? Why might I use that?

1:

SELECT SOMETHING
FROM FOO
INNER JOIN BAR
ON FOO.NAME = BAR.NAME
WHERE BAR.WEIGHT < 500

2:

SELECT SOMETHING
FROM FOO
INNER JOIN BAR
ON FOO.NAME = BAR.NAME AND BAR.WEIGHT < 500
+3  A: 

The output is the same since it is an inner join, and the query plan should be the same. I prefer the syntax in #1 as it separates the joining clause from the filter clause, and is thus more readable to me. This makes it easier to double-check your ON clauses when reading over a large query, as it reduces the noise that would be there if filter clauses were mixed in.

When doing outer joins, this is no longer a matter of preference - where you put BAR.WEIGHT < 500 would have a direct effect on the output.

RedFilter
+3  A: 

Where it makes a difference is when you have OUTER JOINS, then you will get different results from the queries.

See http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN for explanation as to why.

HLGEM
+5  A: 

"Is 1) returning the product of the join and only then filtering out those joined records which have weight >= 500? And is 2) filtering out those prior to the join?"

true, except that logically, 2) is applying the filter as part of the join, not prior to the the join. For an inner join, however, this distinction will have no effect on the final resultset of the query. For Outer joins, otoh, this distinction can alter the results.

NOTE: Using the word logically (saying that this is the sequence in which the query processor logically performs these steps) is important, and is meant literally. The actual sequence used by the proccessor may or may not conform to this logical description. It is only guaranteeed that the results of the query will appear as though the processor performed these steps in this order...

Charles Bretana
I somewhat disagree. SQL is a 4GL and doesn't pretend to specify *how* the query should be run; it is a specification of what results are required. I guess, however, when teaching SQL to students it is probably helpful to describe how the clauses work in a procedural fashion.
Jeffrey Kemp
@Jeffrey, perhaps my NOTE was not worded well. That is exactly what I intended to communicate... that the "explanation" does not describe what the query processor is actually doing, only what the results are... it guarantees only that the results will `look like` the processor performed them in the described sequence...
Charles Bretana
A: 

Eyeballing the queries it certainly seems that your statements 1) and 2) might be correct; however, the optimizer may have a different opinion. I suggest you should run these queries through EXPLAIN PLAN and see if the generated plans are actually different.

I'm curious, though - what syntactical ambiguities are causing you concerns? I much prefer the ANSI syntax because to me it's much clearer what's going on.

Share and enjoy.

Bob Jarvis
A: 

Is 2 bad syntax? Why might I use that?

There's nothing wrong syntactically with placing a predicate in the JOIN clause or in the WHERE clause, either way works in Oracle. Oracle doesn't actually care anyway :)

Logically, I will sometimes use the JOIN and WHERE clauses to make large, complicated queries more self-documenting. For example:

SELECT dept.name, SUM(emp.salary)
FROM   dept
JOIN   emp
       ON dept.deptno = emp.deptno
          AND emp.commission IS NULL
WHERE  dept.region = :region
GROUP BY dept.name;

This query has three predicates, one of which is a join criterion. I've put the emp.commission predicate in the JOIN clause to indicate to other developers that this is a "non-negotiable" - in the context of this query that predicate should always be there. The dept.region predicate is in the WHERE clause to indicate that it is the user-supplied criteria.

This is a style choice, there is no rule, and I wouldn't do it this way all the time. It helps, however, when a query joins many tables, and when there are dozens of predicates. The JOIN clauses allow predicates to be "grouped" nicely, where otherwise we'd sort the WHERE clause so that predicates were grouped together logically.

Jeffrey Kemp
A: 

Conditions on the WHERE clause are applied to the final result-set.

Conditions on the ON clause are only applied to the joined operation.

So identify the conditions for the final result-set and join operations, then add them in the right place.

In the case of INNER JOIN there is no difference, because each and every inner join operations are responsible for the final result-set (not true for outer join operation). So it is a serious issue for outer join cases.

Adding the conditions in the right place has following added advantages,

  1. Maintainable
    • Easy to change from inner join to outer join. No worries about conditions.
    • Easy to remove a join table from the query.
    • Easy to change application specific conditions. i.e. Join conditions are same for all the scenarios. But application specific conditions (inside WHERE conditions) may vary for scenario to scenario.
  2. Readable: Easier to identify the relationships between tables.
Sujee