views:

94

answers:

2

Perhaps a dumb question, but consider these 2 tables :

T1                       
Store    Year
01  2009
02  2009
03  2009
01  2010
02  2010
03  2010

T2
Store
02

Why is this INNER JOIN giving me the results I want (filtering the [year] in the ON clause) :

select t1.*
from t1
inner join t2
on t1.store = t2.store
and t1.[year] = '2009'

Store    Year
02  2009

And why the LEFT OUTER JOIN include records of year 2010 ?

select t1.*
from t1
left outer join t2
on t1.store = t2.store
and t1.year = '2009'
where t2.store is null

01  2009
03  2009
01  2010
02  2010
03  2010

And I have to write the [year] filter in the 'WHERE' clause :

select t1.*
from t1
left outer join t2
on t1.store = t2.store
where t2.store is null
and t1.year = '2009'

01  2009
03  2009

Like I said, perhaps a dumb question, but it's bugging me !

A: 

Your second query returns all the results from the first table, except the ones matched in the join (whose results can be seen in the first query.) This might be easier for you to visualize if you take away the "where t2.store is null" clause, and add "t2.Store" to the selected columns. The results you should see then are:

01 2009 null
02 2009 02
03 2009 null
01 2010 null
02 2010 null
03 2010 null

As you can see, if you then filter to just the rows where t2.store is null, all you're doing is subtracting the second row (the only one that matched the join clause.)

Rick
again, thanks for your input, but the 1st answer explains the "why"
Erwin
That's fine. As an aside, note that in StackOverflow "1st answer" doesn't mean what you think it means. I posted 15 minutes before the one you mean, so his was the "3rd" post initially. However it now shows as first because it was upvoted.It's easier to refer to answers by the author, e.g.: "@Unreason's post is a good explanation of why."
Rick
+1  A: 

If you go by the definition of the LEFT JOIN that

  • for each row from the left side table it will return a matching row from the right side table if one exists
  • if no rows exist on the right side it will still return a row from a left side table with all columns from the right side table set to NULL

Now realize that the exist is equivalent to ON condition evaluates to true and it all makes sense.

For row where t1.year is 2010 the on expression evaluates to false (x AND 2010=2009 is false for all rows where t1.year = 2010), but since it is a left join the row from the left table will still be returned (according to the definition).

Therefore this kind of condition can only be written as where condition and not join condition.

(in general case it need not be table but a select expression)

EDIT: As Erwin interestingly pointed out, the where condition can be turned into JOIN with

select t1.*
from t1
left outer join t2
on t1.store = t2.store
or t1.year = '2009'
where t2.store is null

because:

t1.store t1.year t2.store t1.store=t2.store t1.year=2009  join(OR)
01       2009    02       false             true          true
02       2009    02       true              true          true
03       2009    02       false             true          true
01       2010    02       false             false         false
02       2010    02       true              false         true
03       2010    02       false             false         false

so only the rows in which join column in the above table is false will return nulls in t2.* fields.

What I meant to say is that this kind of condition can not be turned into a pure join because the way left join works (still returns records even when join condition is false), as is possible with inner join conditions (can be turned into pure joins and vice versa).

Regarding the proposed query - my advice would be not to use it.

Queries with ORs perform worse then queries with AND conditions as a general rule (OR expand the result set; ANDs restrict). This applies both to join conditions and to where conditions.

Your query would perform worse then query that has t1.year = '2009' in the where condition, because in later case it would be possible to use an index (if one exists), because if you join like you do, you are basically artificially joining records from one table with records from another just so that your where condition filter only the records you need. It should be more effective to only get the records from t1 that have 2009 to begin with (assuming there's an index on year and selectivity is high enough that will happen with the WHERE condition).

Both of these suggestions regarding performance can and should be verified by inspecting the query plan.

Finally, the query is a bit obscure - it is not immediately obvious that in case of t1.year = '2009' the join becomes Cartesian product (which is filtered out later). So, if the assumption that simple LEFT JOIN with where t1.year = 2009 AND t2.store is null perform better AND is more readable, I would not use this query.

Unreason
Thank you, that is a perfect answer !
Erwin
Sorry I can't vote for this question by lack of unsufficient reputation, but as soon I'll earn my points, I'll vote for your answer
Erwin
following up on your answer, I did a little experiment : select t1.* from t1 left outer join t2 on t1.store = t2.store -- using OR instead of ANDOR t1.year = '2009' where t2.store is null gave me the right answer !
Erwin
@Erwin: It does return the same result, but it is a bit obscure and on average it would perform worse than WHERE variant. I'll update my answer to clean up the last statement and the general advice.
Unreason
Hi ! Of course the t1.year = 2009 should be used in the WHERE clause, I only wanted to test your explanation and it passed briliantly ! Tnx for further elaborating your answer !
Erwin