views:

110

answers:

3

For example, I have the following tables:

animal
-----------------------
animal_id | animal_name
-----------------------

owners
-----------------------
owner_id | owner_name
-----------------------

owners_animals
--------------------
owner_id | animal_id
--------------------

I want to find the animals with no owners so I do the query:

select animal_name 
from (select * from animals) as a 
    left join (select * from owners_animals) as o on (a.animal_id = o.animal_id) 
where owner_id is NULL

Is this way of filtering data using a join acceptable and safe? With the same schema, is there a better alternative to get the same result?

+4  A: 

Use a Not Exists clause:

Select animal_name 
From animals as a 
Where Not Exists(Select 1
                 From owners_animals oa
                 Where oa.animal_id = a.animal_id)

Also, put an index of owners_animals.animal_id to make this filter as fast as possible

Nick Craver
Ignore my post - Nick's is better.
Steve Homer
This form's a bit harder to follow, but (with proper indexing) quicker to run.
Philip Kelley
Not quicker to run, they both get planned the same, I find it easier to follow btw there is no hanging artifact in a remote part, and the `not exists` makes the intention very clear.
Evan Carroll
@Evan - I don't think that's necessarily true for PostgreSQL, I think it actually does the join, would need to test for sure though...don't have a server handy. Anyone that can test?
Nick Craver
Quite sure it is planned the same, the one you have to watch out for is `NOT IN ()`, not `NOT EXISTS ()` both `NOT EXISTS ()` and the `LEFT OUTER JOIN ... IS NULL` will get planned the same.
Evan Carroll
I'll buy that, I've always used exists out of a readability habit so I have no clue, but makes sense that they've improved the optimizer...it has been 3 versions since I've used PostgrSQL after all.
Nick Craver
My understanding is that with a left outer join, the database sysem may have to fully process the outer join (subquery) before it can properly process the outer query, where with the "NOT EXISTS" the query engine knows that it can ditch an "outer" row as soon as any one value is found by the inner/subquery. No perceptible difference on small tables, but a potentially large difference on huge ones.
Philip Kelley
Wait, no, I think I got that wrong (I'm still figuring this one out). I think the real time-saver is in having that "not exists" clause be a correlated subquery. I wish I could locate the SO post that discussed this...
Philip Kelley
+2  A: 

Assuming there's nothing postgres specific going on (I'm not familiar with postgres) then the following is easier to follow.

Select *
From animals a
    left outer join owners_animals oa On a.animal_id = oa.animal_id
Where oa.owner_id is NULL
Steve Homer
+1 for a (likely) better performing solution than using a correlated subquery. I think you have a minor typo: `o.animal_id` should be `oa.animal_id`
Adam Bernier
Fixed - thanks.
Steve Homer
No performance difference.
Evan Carroll
A: 

Don't ever do, FROM (SELECT * FROM table), just do FROM table, same goes with the LEFT JOIN. What you wrote is just an overly verbose

SELECT animal_name
FROM animals
LEFT JOIN owners_animals
  USING ( animal_id )
WHERE owner_id IS NULL;

With that said, I often like the NOT EXISTS() option, because it keeps the owner_id IS NULL fragment out.

USING (foo) is the same as foo = foo, on the joined tables, except only one of them will be in the result set.

Evan Carroll
Is USING () supported cross platforms? I've not seen it before.
Steve Homer
I believe it is in the spec, it has been in Pg forever.
Evan Carroll