tags:

views:

133

answers:

5

I have two tables:

Customers(Id, Name, TownId) T
Towns(Id, Name)

I have an SQL statement like this:

SELECT *
FROM Customers
INNER JOIN Towns ON Towns.Id = Customers.TownId
WHERE Customers.Id > 5

What would happen first?
Would it filter the Customers table and then join the selected records with Towns table?
Would it join all Customers with Towns and then filter? Or is it that you cannot tell?

A: 

I believe the join will act first.

Paddy
The SQL Standard defines a particular order, which implementations must act "as if" they are working in that order (they're free do do differently, so long as the results are the same). Standard says the joins happen first, so +1 to counteract whoever down-voted.
Damien_The_Unbeliever
Thanks - it certainly was always my understanding. Fixed a few bugs due to odd join criteria...
Paddy
+11  A: 

The optimizer will do whatever it thinks will be fastest.

You can force certain behaviors with join hints or encourage certain behaviors with statistics and indexes. It's usually best to Trust the Optimizer, though.

If you want a detailed explanation of how a query is executed, look at the execution plan.

Michael Haren
The common techniques to "force" a particular QEP are almost all vendor dependant... effectively achieving vendor-lock-in. Careful with That Axe, Eugene.
corlettk
+1 for Pink Floyd. :-)
GuyBehindtheGuy
A: 

In laymans terms, the filter will happen before the join, so you won't join the entirety of both tables before filtering.

Depending on the toolset you are using, you should find an option to show the execution plan of the query, which will help you to see the order of execution and the cost of each stage.

Sohnee
A: 

Generally speaking, joining happens first. Consider:

t1.id  t1.foo      t2.id   t2.bar  t2.t1_id
-------------      ------------------------
    1   'abc'          1    '123'         1
    2   'def'          2    '456'         1
    3   'ghi'          3    '789'         2
    4   'jkl'          4    '0'        NULL

This query:

SELECT
  t1.foo,
  t2.bar
FROM
  t1
  LEFT JOIN t2 ON t1.id = t2.t1_id
WHERE
  t2.somevalue IS NOT NULL

will yield:

foo     bar
-------------
'abc'   '123'
'abc'   '456'
'def'   '789'

whereas, when you pull the filter into the join condition, filtering happens at the join as well:

SELECT
  t1.foo,
  t2.bar
FROM
  t1
  LEFT JOIN t2 ON t1.id = t2.t1_id AND t2.somevalue IS NOT NULL

will yield:

foo     bar
-------------
'abc'   '123'
'abc'   '456'
'def'   '789'
'ghi'    NULL
'jkl'    NULL

The more complex the query gets, the less easy it is to say which records are filtered out by the execution plan before the tables are joined, and which after that.

Tomalak
it's not exactly the answer to my question.. in MY example, can you guarantee that joining will always happen first?
agnieszka
In *your* example, it makes no difference at all, because you use an inner join. Filter first - join later, or the other way around, there will be no different result set no matter what you do. The edge cases are with the outer joins.
Tomalak
it makes no difference in the result, but it makes difference in the efficiency. anyway, it doesn't matter, i just want to know what happens first, even if the returned results are the same
agnieszka
SQL Server will most certainly make the right decision. In your case, it has a perfect way to boost performance by applying where WHERE predicate to the source table before the join happens, so it will do that. Leave off the WHERE clause, or do something more obscure in it, and the full table will be joined instead. As always - it depends on what you do. There is no general "yes" or "no" answer to your question, and the query optimizer is pretty smart at figuring out the fastest way of answering your query and making a dynamic decision.
Tomalak
great this is what i wanted to know - the answer is "you cannot tell as it may differ"
agnieszka
Exactly. It can change, too, as the database evolves (dml changes, more data, indexes, stats, etc.)
Michael Haren
That's why I said things like "generally speaking..." and "The more complex the query gets..." and included an edge case where it actually makes a visible difference. All other times (where it makes no visible difference) I usually let the query optimizer do it's thing. I did not encounter many occasions where supplying join hints was necessary for speed or correctness.
Tomalak
A: 

When the query is exercised as a merge join, filtering and joining occur simultaneously.

AlexKuznetsov