views:

195

answers:

3

When joining across tables (as in the examples below), is there an efficiency difference between joining on the tables or joining subqueries containing only the needed columns?

In other words, is there a difference in efficiency between these two tables?

SELECT result
  FROM result_tbl
  JOIN test_tbl                    USING (test_id)
  JOIN sample_tbl                  USING (sample_id)
  JOIN (SELECT request_id
          FROM request_tbl
         WHERE request_status='A') USING(request_id)

vs

SELECT result
  FROM (SELECT result,  test_id   FROM result_tbl)
  JOIN (SELECT test_id, sample_id FROM test_tbl)   USING(test_id)
  JOIN (SELECT sample_id          FROM sample_tbl) USING(sample_id)
  JOIN (SELECT request_id
          FROM request_tbl
         WHERE request_status='A')                 USING(request_id)
+3  A: 

It doesn't matter. It may actually be WORSE since you are taking control away from the optimizer which generally knows best.

However, remember if you are doing a JOIN and only including a column from one of the tables that it is QUITE OFTEN better to re-write it as a series of EXISTS statements -- because that's what you really mean. JOINs (with some exceptions) will join matching rows which is a lot more work for the optimizer to do.

e.g.

SELECT t1.id1
  FROM table1 t1
 INNER JOIN table2 ON something = something

should almost always be

SELECT id1
  FROM table1 t1
 WHERE EXISTS( SELECT *
                 FROM table2
                WHERE something = something )

For simple queries the optimizer may reduce the query plans into identical ones. Check it out on your DBMS.

Also this is a code smell and probably should be changed:

JOIN (SELECT request_id FROM request_tbl WHERE request_status='A')

to

SELECT result
  FROM request
 WHERE EXISTS(...)
   AND request_status = 'A'
Matt Rogish
+5  A: 

The only way to find out for sure is to run both with tracing turned on and then look at the trace file. But in all probability they will be treated the same: the optimizer will merge all the inline views into the main statement and come up with the same query plan.

Tony Andrews
+2  A: 

No difference.

You can tell by running EXPLAIN PLAN on both those statements - Oracle knows that all you want is the "result" column, so it only does the minimum necessary to get the data it needs - you should find that the plans will be identical.

The Oracle optimiser does, sometimes, "materialize" a subquery (i.e. run the subquery and keep the results in memory for later reuse), but this is rare and only occurs when the optimiser believes this will result in a performance improvement; in any case, Oracle will do this "materialization" whether you specified the columns in the subqueries or not.

Obviously if the only place the "results" column is stored is in the blocks (along with the rest of the data), Oracle has to visit those blocks - but it will only keep the relevant info (the "result" column and other relevant columns, e.g. "test_id") in memory when processing the query.

Jeffrey Kemp