views:

871

answers:

6

I am an old-school MySQL user and always preferred JOIN over sub-query. But nowadays everyone uses sub-query and I hate it, dunno why.
Though I've lack of theoretical knowledge to judge myself if there are any difference. Well, I am curious if sub-query as good as join and there is no thing to worry about?

+13  A: 

Sub-queries are the logically correct way to solve problems of the form, "Get facts from A, conditional on facts from B". In such instances, it makes more logical sense to stick B in a sub-query than to do a join. It is also safer, in a practical sense, since you don't have to be cautious about getting duplicated facts from A due to multiple matches against B.

Practically speaking, however, the answer usually comes down to performance. Some optimisers suck lemons when given a join vs a sub-query, and some suck lemons the other way, and this is optimiser-specific, DBMS-version-specific and query-specific.

Historically, explicit joins usually win, hence the established wisdom that joins are better, but optimisers are getting better all the time, and so I prefer to write queries first in a logically coherent way, and then restructure if performance constraints warrant this.

Marcelo Cantos
Great answer. I'd also add that developers (esp. amateur ones) are not always proficient in SQL.
Álvaro G. Vicario
A: 

Subqueries are generally used to return a single row as an atomic value, though they may be used to compare values against multiple rows with the IN keyword. They are allowed at nearly any meaningful point in a SQL statement, including the target list, the WHERE clause, and so on. A simple sub-query could be used as a search condition. For example, between a pair of tables:

SELECT title FROM books WHERE author_id = (SELECT id FROM authors WHERE last_name = 'Bar' AND first_name = 'Foo');

Note that using a normal value operator on the results of a sub-query requires that only one field must be returned. If you're interested in checking for the existence of a single value within a set of other values, use IN:

SELECT title FROM books (WHERE author_id IN (SELECT id FROM authors WHERE last_name ~ '^[A-E]');

This is obviously different from say a LEFT-JOIN where you just want to join stuff from table A and B even if the join-condition doesn't find any matching record in table B, etc.

If you're just worried about speed you'll have to check with your database and write a good query and see if there's any significant difference in performance.

rkulla
+10  A: 

Most of the Cases JOINS are faster then Sub-queries and it is very rare conditions sub-query become faster. In joins RDBMS can create an execution plan that suits better for your query and can predict what data should be loaded to be processed and save time unlike the sub-query where it should run all the queries and load all their data to do the processing

The good thing in sub-queries that they are more readable then joins, that's why most of new SQL people prefer them because it is the easy way. but when it comes for performance JOINS are better in most cases even though they are not hard to read too.

Kronass
Yes, most databases therefore includes it as an optimization step to convert subqueries into joins when it is analyzing your query.
Cine
This answer is a bit too simplified for the question that was asked. As you state: certain subqueries are ok and certain are not. The answer does not really help to distinguish the two. (also the 'very rare' really depends on your data/app).
Unreason
A: 

Sub-queries are mainly useful for when you actually do need to use 2 queries to find data. An example would be, select all people who's sales were above average. Well, first you have to find out the average (1 query there) and then you have to compare everyone's sales against that average (the second select).

As for Joins vs Subqueries, remember that no matter which one you pick that the statement with the subquery will be executing two select statements while the join will only be selecting one select statement, so by this attribute a join statement will always be faster... however I don't have any experience in actual real-world difference so don't take my word for it -- write two queries from a large DB and see which one comes back faster.

Nitrodist
+6  A: 

Use EXPLAIN to see how your database executes the query on your data. There is a huge "it depends" in this answer...

PostgreSQL can rewrite a subquery to a join or a join to a subquery when it thinks one is faster than the other. It all depends on the data, indexes, correlation, amount of data, query, etc.

Frank Heikens
+1  A: 

First of all, to compare the two first you should distinguish queries with subqueries to:

  1. a class of subqueries that always have corresponding equivalent query written with joins
  2. a class of subqueries that can not be rewritten using joins

For the first class of queries a good RDBMS will see joins and subquereis as equivalent and will produce same query plans.

These days even mysql does that.

Still, sometimes it does not, but this does not mean that joins will always win - I had cases when using subqueries in mysql improved performance. (For example if there is something preventing mysql planner to correctly estimate the cost and if the planner do not see the join-variant and subquery-variant as same then subqueries can outperform the joins by forcing a certain path).

Conclusion is that you should test your queries for both join and subquery variants if you want to be sure which one will perform better.

For the second class the comparison makes no sense as those queries can not be rewritten using joins and in these cases subqueries are natural way to do the required tasks and you should not discriminate against them.

Unreason