views:

21592

answers:

4

I have three tables, A, B, C, where A is many to one B, and B is many to one C. I'd like a list of all C's in A.

My tables are something like this: A[id, valueA, lookupB], B[id, valueB, lookupC], C[id, valueC]. I've written a query with two nested SELECTs, but I'm wondering if it's possible to do INNER JOIN with DISTINCT somehow.

SELECT valueC
FROM C
INNER JOIN
(
    SELECT DISTINCT lookupC
    FROM B INNER JOIN
    (
     SELECT DISTINCT lookupB
     FROM A
    ) 
    A2 ON B.id = A2.lookupB
) 
B2 ON C.id = B2.lookupC

EDIT: The tables are fairly large, A is 500k rows, B is 10k rows and C is 100 rows, so there are a lot of uneccesary info if I do a basic inner join and use DISTINCT in the end, like this:

SELECT DISTINCT valueC
FROM 
C INNER JOIN B on C.id = B.lookupB
INNER JOIN A on B.id = A.lookupB

This is very, very slow (magnitudes times slower than the nested SELECT I do above.

A: 

Is this what you mean?

SELECT DISTINCT C.valueC
FROM 
C
INNER JOIN B ON C.id = B.lookupC
INNER JOIN A ON B.id = A.lookupB
kristian
Unfortunately this seems to be very slow, since it needs to create huge join tables first. When I run my code it is quicker, which I guess is because it prunes the join tables.
Mats Fredriksson
I've reversed the join order which, depending on which SQL you are using, may improve the performance of the query.
kristian
Nah, reordering the joins didn't help that much, unfortunately..
Mats Fredriksson
+3  A: 

I believe your 1:m relationships should already implicitly create DISTINCT JOINs.

But, if you're goal is just C's in each A, it might be easier to just use DISTINCT on the outer-most query.

SELECT DISTINCT a.valueA, c.valueC
FROM C
    INNER JOIN B ON B.lookupC = C.id
    INNER JOIN A ON A.lookupB = B.id
ORDER BY a.valueA, c.valueC
Jonathan Lonowski
It turned out to be enough, the issue I had was that I didn't have an index in place.
Mats Fredriksson
A: 
SELECT DISTINCT C.valueC 
FROM C 
  LEFT JOIN B ON C.id = B.lookupC
  LEFT JOIN A ON B.id = A.lookupB
WHERE C.id IS NOT NULL

I don't see a good reason why you want to limit the result sets of A and B because what you want to have is a list of all C's that are referenced by A. I did a distinct on C.valueC because i guessed you wanted a unique list of C's.


EDIT: I agree with your argument. Even if your solution looks a bit nested it seems to be the best and fastest way to use your knowledge of the data and reduce the result sets.

There is no distinct join construct you could use so just stay with what you already have :)

VVS
The result is the same, but it seems to be slower. I don't know exactly why, but I would guess that it's because it creates two huge join tables to C.valueC will be very large. Since I know that B.lookupC should be DISTINCT by it self I want to use that knowledge to speed things up.
Mats Fredriksson
+3  A: 

I did a test on MS SQL 2005 using the following tables: A 400K rows, B 26K rows and C 450 rows.

The estimated query plan indicated that the basic inner join would be 3 times slower than the nested sub-queries, however when actually running the query, the basic inner join was twice as fast as the nested queries, The basic inner join took 297ms on very minimal server hardware.

What database are you using, and what times are you seeing? I'm thinking if you are seeing poor performance then it is probably an index problem.

Darrel Miller
It was an indexing issue. I'm using MSSQL 2005, and finally found the Database Engine Tuning Advisor. I added an index on A.lookupB and it sped things up dramatically. Thanks for your help!
Mats Fredriksson