tags:

views:

47

answers:

4

I've written the same query as a subquery and a self-join.

Is there any obvious argument for one over the other here?

SUBQUERY:

SELECT prod_id, prod_name
FROM products
WHERE vend_id = (SELECT vend_id
FROM products
WHERE prod_id = ‘DTNTR’);

SELF-JOIN:

SELECT p1.prod_id, p1.prod_name
FROM products p1, products p2
WHERE p1.vend_id = p2.vend_id
AND p2.prod_id = ‘DTNTR’;
+3  A: 

First query may throw error if the subquery returns more than a value

Second query is not as per ANSI

So better use ANSI style join

SELECT p1.prod_id, p1.prod_name 
FROM products as p1 inner join products as p2 
on p1.vend_id = p2.vend_id 
WHERE p2.prod_id = ‘DTNTR’; 
Madhivanan
A: 

If you don't need any of the columns from the JOINed table, then using a subselect is generally preferable, although this is dependent on RDBMs type. An IN clause should be used instead:

SELECT prod_id, prod_name
FROM products
WHERE vend_id IN (SELECT vend_id
   FROM products
   WHERE prod_id = ‘DTNTR’);
mdma
+2  A: 

This post has some figures on execution times. The poster states:

The first query shows 49.2% of the batch while the second shows 50.8%, leading one to think that the subquery is marginally faster.

Now, I started up Profiler and ran both queries. The first query required over 92,000 reads to execute, but the one with the join required only 2300, leading me to believe that the inner join is significantly faster.

There are conflicting responses though:

My rule of thumb: only use JOIN's if you need to output a column from the table you are join'ing to; otherwise, use sub-queries.

and this:

Joining should always be faster - theoretically and realistically. Subqueries - particularly correlated - can be very difficult to optimise. If you think about it you will see why - technically, the subquery could be executed once for each row of the outer query - blech!

I also agree with Madhivanan, if the sub query returns anything but one value your main query will fail, so use IN instead.

ChrisF