tags:

views:

226

answers:

3

I am trying to compare two tables to find rows in each table that is not in the other. Table 1 has a groupby column to create 2 sets of data within table one.

groupby     number
----------- -----------
1           1
1           2
2           1
2           2
2           4

Table 2 has only one column.

number
-----------
1
3
4

So Table 1 has the values 1,2,4 in group 2 and Table 2 has the values 1,3,4.

I expect the following result when joining for Group 2:

`Table 1 LEFT OUTER Join Table 2`
T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
2           2           NULL

`Table 2 LEFT OUTER Join Table 1`
T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
NULL        NULL        3

The only way I can get this to work is if I put a where clause for the first join:

PRINT 'Table 1 LEFT OUTER Join Table 2, with WHERE clause'
select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table1
        LEFT OUTER join table2
        --******************************
        on table1.number = table2.number
        --******************************
WHERE   table1.groupby = 2
    AND table2.number IS NULL

and a filter in the ON for the second:

PRINT 'Table 2 LEFT OUTER Join Table 1, with ON clause'
select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join table1
            --******************************
            on table2.number = table1.number
            AND table1.groupby = 2
            --******************************
WHERE   table1.number IS NULL

Can anyone come up with a way of not using the filter in the on clause but in the where clause?

The context of this is I have a staging area in a database and I want to identify new records and records that have been deleted. The groupby field is the equivalent of a batchid for an extract and I am comparing the latest extract in a temp table to a the batch from yesterday stored in a partioneds table, which also has all the previously extracted batches as well. Code to create table 1 and 2:

create table table1 (number int, groupby int)
create table table2 (number int)
insert into table1 (number, groupby) values (1, 1)
insert into table1 (number, groupby) values (2, 1)
insert into table1 (number, groupby) values (1, 2)
insert into table2 (number) values (1)
insert into table1 (number, groupby) values (2, 2)
insert into table2 (number) values (3)  
insert into table1 (number, groupby) values (4, 2)  
insert into table2 (number) values (4)  

EDIT:

A bit more context - depending on where I put the filter I different results. As stated above the where clause gives me the correct result in one state and the ON in the other. I am looking for a consistent way of doing this.

Where -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table1
        LEFT OUTER join table2
            --******************************
            on table1.number = table2.number
            --******************************
WHERE   table1.groupby = 2 
    AND table2.number IS NULL

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
2           2           NULL

On -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table1
        LEFT OUTER join table2
            --******************************
            on table1.number = table2.number
            AND table1.groupby = 2
            --******************************
WHERE   table2.number IS NULL

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
1           1           NULL
2           2           NULL
1           2           NULL

Where (table 2 this time) -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join table1
            --******************************
            on table2.number = table1.number
            AND table1.groupby = 2
            --******************************
WHERE   table1.number IS NULL

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
NULL        NULL        3

On -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join table1
            --******************************
            on table2.number = table1.number
            --******************************
WHERE   table1.number IS NULL
    AND table1.groupby = 2

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
(0) rows returned
+1  A: 

with LEFT OUTER JOINS, you must filter in the ON clause or use this:

WHERE
    (LeftJoinTable.ID IS NULL OR LeftJoinTable.Col1=YourFilter)

if you just filter in the WHERE:

WHERE 
    LeftJoinTable.Col1=YourFilter

you will discard the parent joining row whenever there isn't a LeftJoinTable.ID (making the join an INNER JOIN).

By placing the filter in the ON, you cause the elimination of the LEFT JOIN row but not the elimination of the parent joining row, that is just how it works.

EDIT base don OP's comment
the only way to filter a a LEFT OUTER JOIN table is in the ON clause, unless you want to use an OR like I show in the first code example above. There is nothing wrong filtering a LEFT OUTER JOIN in the ON clause, this is how you do it.

KM
I probably (hindsight being the perfect science) should have put this in the original question, but if you look at my comment above you will see that changing the filter from the where to the on first either joins gets you a different result. I am looking for a 'rule' when coding left outer joins that I should either use ON clauses or where clauses.
Greg Potter
+3  A: 

If you filter the left outer joined table in the WHERE clause then you are in effect creating an inner join

See also this wiki page: WHERE conditions on a LEFT JOIN

SQLMenace
Thanks for the wiki link.
Greg Potter
+1  A: 

As the query is written, it makes sense to put the join in the ON clause, since you specifically only want to join on values in group '2' from table 1.

The alternative is to prefilter table1 to the group you are interested in, like this

select  t1Group.groupby,
        t1Group.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join (SELECT * FROM table1 WHERE groupby=2) t1Group
            on table2.number = t1Group.number
WHERE   t1Group.number IS NULL
mdma
+1 the key to understanding why filtering in the "ON" clause works is to take the filter into an inline view like this.
araqnid