views:

1114

answers:

4

Hi Folks,

I've two tables: TableA and TableB, joined by TableA.TableA_Id->1..n<-TableB.TableA_Id. A simple PK-FK.

I need to extract distinct TableA records given a certain condition on TableB. Here's my 1st approach:

SELECT * FROM TableA A INNER JOIN TableB B ON A.idA = B.IdA AND B.Date = '2009-01-10' ORDER BY A.Id;

This is nice, but it doesn't give me "distinct" records. Some records on table B may be duplicate and hence I could get the same records more than once.

So I decided to perform a subselect (performance is not an issue given that the subselect will probably end up with 20/30 records max):

SELECT * FROM TableA WHERE TableA.Id IN ( SELECT DISTINCT IdA FROM TableB WHERE Date = '20090110' ) ORDER BY TableA.IdA;

This works fine.

Now the question is: how can I use the Inner Join and still get the distinct values? Is this possible in one pass or the nested query is a must? What am I missing?

+1  A: 

And the problem using

SELECT DISTINCT * FROM TableA A INNER JOIN TableB B ON A.idA = B.IdA 
AND B.Date = '2009-01-10' ORDER BY A.Id;

is?

If it's that it does return duplicate idA values, that's because you are selecting too many columns, if you can't reduce what you select, you need the subquery.

Vinko Vrsalovic
'Distinct *' is actually '*', if there are no exact duplicates.
Adeel Ansari
Exactly, that's what I'm saying.
Vinko Vrsalovic
Especially if these are large tables, the derived table approach where the records from tableb are limited before the join may save time. That's one reason to try it. ALso since you aren't limiting the column list to table a columns, it still won't give the correct answer.
HLGEM
The subquery will return only the id's, hence why it is "DISTINCT IdA". Of course a Distinct * will return probably the same as * unless all the records are the same (which is not the case). Thanks.
Martín Marconcini
+2  A: 

use a derived table

SELECT * FROM TableA 
JOIN
(SELECT DISTINCT IdA FROM TableB WHERE Date = '20090110') a
ON a.IDA = TAbleA.IDA
ORDER BY TableA.IdA
HLGEM
A: 

How about this :

select * from TableA a 
  where a.TableA_Id in
(select TableA_Id from TableB where [Date] = '2009-01-10')
order by a.TableA_Id

You could add distinct to the subquery if you want, I'm not sure whether this will improve or reduce preformance offhand.

inferis
+1  A: 

I think a normal exists statement is what you need:

SELECT * 
FROM TableA A 
WHERE Exists( select B.IdA from TableB B where A.IdA = B.IdA and B.Date = '2009-01-10' )
ORDER BY A.Id;

Performance-wise it should be the best approach.

If you need values from the other table, and to avoid using distinct, you could join to a sub query that is grouped like so:

Select TableA.*, groupedTableB.OtherData 
From TableA
Inner join 
(
    select TableB.IdA, Sum(TableB.Data) SummaryData
    from TableB where TableB.Date = '2009-01-10'
    group by TableB.IdA

) groupedTableB on groupedTableB.IdA = TableA.IdA
Dan
Just so you know, that second on is not a subquery, it is a derived table.
HLGEM
Thanks for the idea. Will try this too
Martín Marconcini