views:

1425

answers:

5

I am trying to understand how does joins work internally. What will be the difference between the way in which the following two queries would run?

For example

(A)

Select * 
FROM TABLE1
FULL JOIN TABLE2 ON TABLE1.ID = TABLE2.ID
FULL JOIN TABLE3 ON TABLE1.ID = TABLE3.ID

And

(B)

Select * 
FROM TABLE1
FULL JOIN TABLE2 ON TABLE1.ID = TABLE2.ID
FULL JOIN TABLE3 ON TABLE2.ID = TABLE3.ID

Edit: I am talking about oracle here. Consider some records present in table 2 and table 3 but not in table 1, query A would give two rows for that record but B would give only one row.

A: 

I think, how it works internally, depends on the database system you use.

Peter
+7  A: 

Your DBMS's optimiser will determine how best to perform the query. Usually this is done by "cost based optimisation", where a number of different query plans are considered and the most efficient one selected. If your two queries are logically identical, it is most likely that the optimiser will end up using the same query plan whichever way you write it. In fact, it would be a poor optimiser these days that produced different query plans based on such minor differences in the SQL.

However, full outer joins are a different matter (in Oracle at least), since the way the columns are joined influences the result. i.e. the 2 queries are not interchangeable.

You can use AUTOTRACE in SQL Plus to see the different plans:

SQL> select *
  2  from t1
  3  full join t2 on t2.id = t1.id
  4  full join t3 on t3.id = t2.id;

        ID         ID         ID
---------- ---------- ----------
                    1          1

1 row selected.


Execution Plan
----------------------------------------------------------

---------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     3 |   117 |    29  (11)|
|   1 |  VIEW                   |      |     3 |   117 |    29  (11)|
|   2 |   UNION-ALL             |      |       |       |            |
|*  3 |    HASH JOIN OUTER      |      |     2 |   142 |    15  (14)|
|   4 |     VIEW                |      |     2 |    90 |    11  (10)|
|   5 |      UNION-ALL          |      |       |       |            |
|*  6 |       HASH JOIN OUTER   |      |     1 |    91 |     6  (17)|
|   7 |        TABLE ACCESS FULL| T1   |     1 |    52 |     2   (0)|
|   8 |        TABLE ACCESS FULL| T2   |     1 |    39 |     3   (0)|
|*  9 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  10 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  11 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
|  12 |     TABLE ACCESS FULL   | T3   |     1 |    26 |     3   (0)|
|* 13 |    HASH JOIN ANTI       |      |     1 |    26 |    15  (14)|
|  14 |     TABLE ACCESS FULL   | T3   |     1 |    13 |     3   (0)|
|  15 |     VIEW                |      |     2 |    26 |    11  (10)|
|  16 |      UNION-ALL          |      |       |       |            |
|* 17 |       HASH JOIN OUTER   |      |     1 |    39 |     6  (17)|
|  18 |        TABLE ACCESS FULL| T1   |     1 |    26 |     2   (0)|
|  19 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|* 20 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  21 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  22 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T3"."ID"(+)="T2"."ID")
   6 - access("T2"."ID"(+)="T1"."ID")
   9 - access("T2"."ID"="T1"."ID")
  13 - access("T3"."ID"="T2"."ID")
  17 - access("T2"."ID"(+)="T1"."ID")
  20 - access("T2"."ID"="T1"."ID")

SQL> select *
  2  from t1
  3  full join t2 on t2.id = t1.id
  4  full join t3 on t3.id = t1.id;

        ID         ID         ID
---------- ---------- ----------
                    1
                               1

2 rows selected.


Execution Plan
----------------------------------------------------------

---------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     3 |   117 |    29  (11)|
|   1 |  VIEW                   |      |     3 |   117 |    29  (11)|
|   2 |   UNION-ALL             |      |       |       |            |
|*  3 |    HASH JOIN OUTER      |      |     2 |   142 |    15  (14)|
|   4 |     VIEW                |      |     2 |    90 |    11  (10)|
|   5 |      UNION-ALL          |      |       |       |            |
|*  6 |       HASH JOIN OUTER   |      |     1 |    91 |     6  (17)|
|   7 |        TABLE ACCESS FULL| T1   |     1 |    52 |     2   (0)|
|   8 |        TABLE ACCESS FULL| T2   |     1 |    39 |     3   (0)|
|*  9 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  10 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  11 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
|  12 |     TABLE ACCESS FULL   | T3   |     1 |    26 |     3   (0)|
|* 13 |    HASH JOIN ANTI       |      |     1 |    26 |    15  (14)|
|  14 |     TABLE ACCESS FULL   | T3   |     1 |    13 |     3   (0)|
|  15 |     VIEW                |      |     2 |    26 |    11  (10)|
|  16 |      UNION-ALL          |      |       |       |            |
|* 17 |       HASH JOIN OUTER   |      |     1 |    39 |     6  (17)|
|  18 |        TABLE ACCESS FULL| T1   |     1 |    26 |     2   (0)|
|  19 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|* 20 |       HASH JOIN ANTI    |      |     1 |    26 |     6  (17)|
|  21 |        TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)|
|  22 |        TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)|
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T3"."ID"(+)="T1"."ID")
   6 - access("T2"."ID"(+)="T1"."ID")
   9 - access("T2"."ID"="T1"."ID")
  13 - access("T3"."ID"="T1"."ID")
  17 - access("T2"."ID"(+)="T1"."ID")
  20 - access("T2"."ID"="T1"."ID")

In fact, the query plans are identical except for the Predicate information

Tony Andrews
Yeah but how this is excecuted would also determine the output we get. Consider some records present in table 2 and table 3 but not in table 1, query A would give two rows for that record but B would give only one row.
MOZILLA
The method of execution should NEVER EVER EVER affect the final results of a correctly formed SQL statement. If it does, then your DBMS has a bug. The reason that they would give you different results is because they ask two different questions.
Tom H.
True, I hadn't taken note of the FULL keyword, I was thinking of inner joins.
Tony Andrews
+1  A: 

You stated interest in "internals", and then asked an example that illustrates "semantics". I'm answering semantics.

Consider these tables.

Table1 : 1, 4, 6
Table2 : 2, 4, 5
Table3 : 3, 5, 6

Both examples perform the same join first, so I'll perform that here.

FirstResult = T1 FULL JOIN T2 : (T1, T2)
(1, null)
(4, 4)
(6, null)
(null, 2)
(null, 5)

Example (A)

FirstResult FULL JOIN T3 ON FirstItem : (T1, T2, T3)

(1, null, null)
(4, 4, null)
(6, null, 6)   <----
(null, 2, null)
(null, 5, null)   <----
(null, null, 3)

Example (B)

FirstResult FULL JOIN T3 ON SecondItem : (T1, T2, T3)
(1, null, null)
(4, 4, null)
(6, null, null)   <----
(null, 2, null)
(null, 5, 5)   <----
(null, null, 3)

This shows you logically how to produce the results from the joins.

For "internals", there's something called a query optimizer, which will produce these same results - but it will make implementation choices to do the computation/io fast. These choices include:

  • which tables to access first
  • look into a table using an index or table scan
  • which join implementation type to use (nested loop, merge, hash).

Also note: due to the optimizer making these choices, and changing these choices based on what it considers to be optimal - the order of the results can change. The default ordering of results is always "what is easiest". If you don't want the default ordering, you need to specify ordering in your query.

To see exactly what the optimizer will do with a query (at that moment, because it can change its mind), you need to view the execution plan.

David B
A: 

With query A what you get includes entries in table 1 with a corresponding entry in table3 without corresponding entries in table3 (nulls for t2 columns)

With query B uou don't get those entries because you only go to table3 through table2. If you don't have a corresponding entry in table2, the table2.id will be null and will never match a table3.id

Gary