tags:

views:

585

answers:

5

I have two tables which I want to join together using a left outer join. However, even though my left table contains only unique values, the right table satisfies the CONDITION more than once and as such, adds extra rows to the resultset.

Code to replicate problem:

declare @tb1 table (c1 int)
declare @tb2 table (c2 int)

INSERT INTO @tb1 VALUES (1)
INSERT INTO @tb1 VALUES (2)
INSERT INTO @tb1 VALUES (3)
INSERT INTO @tb1 VALUES (4)

INSERT INTO @tb2 VALUES (3)
INSERT INTO @tb2 VALUES (4)
INSERT INTO @tb2 VALUES (5)
INSERT INTO @tb2 VALUES (6)

select * from @tb1 left outer join @tb2 ON c1 = c2

INSERT INTO @tb2 VALUES (3)
INSERT INTO @tb2 VALUES (4)
INSERT INTO @tb2 VALUES (5)
INSERT INTO @tb2 VALUES (6)

select * from @tb1 left outer join @tb2 ON c1 = c2

As you can see the first SELECT returns 4 rows, where the second SELECT 6, although the left table remains unchanged.

How does one stay strict to the left table, and only use the right table to COMPLEMENT the rows from the left table?

HELP!

RESULTS:
c1          c2
----------- -----------
1           NULL
2           NULL
3           3
4           4

[DUPLICATE @tb2 records]

c1          c2
----------- -----------
1           NULL
2           NULL
3           3
3           3
4           4
4           4
+2  A: 

select distinct * from @tb1 left outer join @tb2 ON c1 = c2

Gennady Shumakher
From 100 ms EXECUTION time, this shot it up to 83seconds.
Theofanis Pantelides
You need to give us more detail of what you're trying to achieve
Murph
How big is the table that took 83 seconds? Are you using the LIKE operator in your real query?
Wez
See comment on top
Theofanis Pantelides
13,166,165 ROWS, with 200,000+ /day
Theofanis Pantelides
+2  A: 

Try useing

select DISTINCT * from @tb1 left outer join @tb2 ON c1 = c2
astander
That would work in this simplified case, but I'm pretty sure in reality his rhs table has more columns - and then distinct won't help.
Tor Haugen
I aggree, and in that case then he needs to decide what makes the row "distinct", or will have to display multiple rows with the id from table 1
astander
how about select distinct on ( col, col)
xenoterracide
+1  A: 

If you want to keep just single rows on the left hand side, you'll need to decide what you want to show on the right, for each unique value on the left. If you want to show a count, for example, you could do this:

select b1.c1, x.c from @tb1 b1 
left outer join 
(
  select c2, count(*) as c 
  from @tb2
  group by c2
) as x 
ON b1.c1 = x.c2

or if you just want one occurence of values from c2:

select b1.c1, x.c2 from @tb1 b1 
left outer join 
(
  select c2
  from @tb2
  group by c2
) as x 
ON b1.c1 = x.c2
davek
The example above is only an example, and as such each table only has one column.
Theofanis Pantelides
@Davek - thanks for this answer. I'm joining 5 tables with > 250K records and this answer was exactly what I needed.
TMG
+2  A: 

Hmm, the query is doing what its supposed to since there are duplicate records (or at least duplicate identifiers) in the right hand table.

To get the effect you want something like:

SELECT * FROM @tb1 LEFT OUTER JOIN (SELECT DISTINCT c2 FROM @tb2) t2 ON @tb1.c1 = t2.c2

If that isn't sufficient you'll need to explain the requirement in a bit more detail.

Murph
+5  A: 

Sorry, but your thinking is skewed.

Think about it this way: if you only want one single row from tb2 for each row in tb1, which one should the server choose? The fact is that from the definition of a join, every row in the right-hand-side table that matches the left-hand-side row is a match and must be included.

You'll have to ensure tbl2 has distinct values for c2 before the join. Murph's suggestion might do it, provided your SQL variant supports DISTINCT [column] (not all do).

Tor Haugen
Yeah, I was afraid of that.
Theofanis Pantelides
This was absolutely right, and as such I have altered my thinking and hence my ON statement to be unique, alleviating the problem.
Theofanis Pantelides