views:

1548

answers:

8

I am learning SQL and am trying to learn JOINs this week.

I have gotten to the level where I can do three table joins, similar to a lot of examples I've seen. I'm still trying to figure out the tiny details of how things work. All the examples I've seen of three table joins use INNER JOINS only. What about LEFT and RIGHT JOINs? Do you ever use these in three table joins? What would it mean?

SELECT ~some columns~ FROM ~table name~
LEFT JOIN ~table 2~ ON ~criteria~
INNER JOIN ~table 3~ ON ~criteria~

or

SELECT ~some columns~ FROM ~table name~
INNER JOIN ~table 2~ ON ~criteria~
LEFT JOIN ~table 3~ ON ~criteria~

or

SELECT ~some columns~ FROM ~table name~
LEFT JOIN ~table 2~ ON ~criteria~
LEFT JOIN ~table 3~ ON ~criteria~

or

???

Just trying to explore the space as much as possible

A: 

This really depends on what you are doing. I've written many 3+ table queries that will have an outer join in them. It just depends on the data you are querying and what you are trying to follow.

The same general logic applies when selecting the join type when you have multiples as with single join queries.

Mitchel Sellers
+5  A: 

Joins are just ways of combining tables. Joining three tables is no different than joining 2... or 200. You can mix and match INNER, [LEFT/RIGHT/FULL] OUTER, and even CROSS joins as much as you want. The only difference is which results are kept: INNER joins only keep rows where both sides match the expression. OUTER joins pick an "origin" table depending on the LEFT/RIGHT/FULL specification, always keep all rows from the origin table, and supply NULL values for rows from the other side that don't match the expression. CROSS joins return all possible combinations of both sides.

The trick is that because you're working with declarative code rather than more-familiar iterative, the temptation is to try to think of it as if everything happens at once. When you do that, you try to wrap your head around the entire query and it can get confusing.

Instead, you want to think of it as if the joins happen in order, from the first table listed to the last. This actually is not how it works, because the query optimizer can re-order things to make them run faster. But it makes building the query easier for the developer.

So with three tables, you start with your base table, then join in the values you need from the next table, and the next, and so on, just like adding lines of code to a function to produce the required output.

As for using the different join types, I've used all the different types I listed here: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, and CROSS. But most of those you only need to use occasionally. INNER JOIN and LEFT JOIN will cover probably 95% or more of what you want to do.

Now let's talk about performance. Often times the order you list tables is dictated to you: you start from TableA and you need to list TableB first in order to have access to columns required to join in TableC. But sometimes both TableB and TableC only depend on TableA, and you could list them in either order. When that happens the query optimizer will usually pick the best order for you, but sometimes it doesn't know how. Even if it did, it helps to have a good system for listing tables so you can always look at a query and know that it's "right".

With that in mind, you should think of a query in terms of the working set currently in memory as the query builds. When you start with TableA, the database looks at all the columns from TableA in the select list or anywhere else (like WHERE or ORDER BY clauses, or potential indexes) in the query, factors in relevant conditions from the WHERE clause, and loads the smallest portion of that table into memory that it can get away with. It does this for each table in turn, always loading as little as possible. And that's the key: you want to keep this working set as small as possible for as long as possible.

So, going back to our three-table join, we want to list the tables in the order that will keep the working set smaller for longer. This means listing the smaller table above the larger one. Another good rule of thumb is that INNER joins tend to shrink result sets, while OUTER joins, tend to grow result sets, and so you want to list your INNER joins first. However, this is not a requirement for a query to work, nor is it always true; sometimes the reverse can happen as well.

Finally, I want to point out again that this isn't how it really works. The query optimizer and execution plans are a very complex topic, and there are lots of tricks the database can take that break this model from time to time. It's just one model that you as a developer can use to help understand what the server is doing and help you write better queries.

Joel Coehoorn
this is a really excellent answer. Your comments on performance (paragraphs 7 and 8) are very illuminating. Is there an article or topic that I can read to learn more about this? I would like to understand your points in even finer detail.
MedicineMan
This is what I've derived from personal experience, primarily via a good mentor at a previous employer. This is mostly about formatting, though; the performance gained from this model is minimal if anything and the query optimizer is usually smarter about things than you are. I use it because it allows me to identify when my idea about what tables are more important changes or when perhaps I made a mistake crafting a query. I'll be able to spot out-of-order joins.
Joel Coehoorn
wow, that's a fantastic answer....+1 :-)...do you write blog anywhere? If you do, could you give me the address please?
Night Shade
+2  A: 

Selecting from three tables is no different from selecting from only two (or as many as a hundred, though that would be a fairly ugly query to read).

For EACH join you write, having INNER indicates that you only want rows that successfully join those two tables together. If other tables were joined earlier in the query, those results are now completely irrelevant, except to the extent your own join conditions call on them.

For example:

SELECT person.*
FROM person
LEFT JOIN vehicle ON (person.person_id = vehicle.owner_id)
LEFT JOIN house ON (person.person_id = house.owner_id)

Here I want a list of all people, and (if available) all the vehicles and houses they own.

Alternatively:

SELECT person.*
FROM person
INNER JOIN vehicle ON (person.person_id = vehicle.owner_id)
LEFT JOIN house ON (person.person_id = house.owner_id)

Here I want all people who own vehicles (they must own a vehicle to get results in my query), and (if available) all the houses they own).

Each join is completely separate here.

Of course, by varying what you put in the ON clause, you can make joins interrelate tables any way you want.

VoteyDisciple
A: 

For the sake of this example, lets say we have a table "employees" with an ID, NAME and MANAGER_ID fields.

Here is a simple query:

SELECT E.ID, E.NAME, M.NAME AS MANAGER
FROM EMPLOYEES E
JOIN EMPLOYEE M ON E.MANAGER_ID = M.ID

This will return all of the employees, with their manager name. But what happens for the boss? he who has no manager? A database null would actually prevent that row from returning as it could not find a matching record to join on. Thus you would use an OUTER join (left or right depending on how your write the query).

The same logic would hold for writing a query with 2+n joins. If you are possibly going to have rows that don't have matches in your join clause, and want those rows to come back (albeit with nulls), then you are golden.

Goyuix
A: 

On some sql engines there's an issue where you're joining a using left join. If you join A->B->C and the row in B doesn't exist then the join column from B is NULL. A few I've used require that the join from B->C must be a left join if the join from A->B is a left join.

This is ok

select a.*, b.*, c.*
 from a
 left join b on b.id = a.id
 left join c on c.id = b.id

this is not

 select a.*, b.*, c.*
 from a
 left join b on b.id = a.id
 inner join c on c.id = b.id
Jay
This question is limited to SQL Server 2000, 2005, and 2008
MedicineMan
+19  A: 

Yes, I do use all three of those JOINs, although I tend to stick to using just LEFT (OUTER) JOINs instead of inter-mixing LEFT and RIGHT JOINs. I also use FULL OUTER JOINs and CROSS JOINs.

In summary, an INNER JOIN restricts the resultset only to those records satisfied by the JOIN condition. Consider the following tables

EDIT: I've renamed the Table names and prefix them with @ so that Table Variables can be used for anyone reading this answer and wanting to experiment

@Table1

id | name
---------
1  | One
2  | Two
3  | Three
4  | Four

@Table2

id | name
---------
1  | Partridge
2  | Turtle Doves
3  | French Hens
5  | Gold Rings

SQL code

DECLARE @Table1 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))

INSERT INTO @Table1 VALUES(1, 'One');
INSERT INTO @Table1 VALUES(2, 'Two');
INSERT INTO @Table1 VALUES(3, 'Three');
INSERT INTO @Table1 VALUES(4, 'Four');

DECLARE @Table2 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))

INSERT INTO @Table2 VALUES(1, 'Partridge');
INSERT INTO @Table2 VALUES(2, 'Turtle Doves');
INSERT INTO @Table2 VALUES(3, 'French Hens');
INSERT INTO @Table2 VALUES(5, 'Gold Rings');

An INNER JOIN SQL Statement, joined on the id field

SELECT 
    t1.id,
    t1.name,
    t2.name
FROM
    @Table1 t1
INNER JOIN
    @Table2 t2
    ON 
        t1.id = t2.id

Results in

id | name | name
----------------
1  | One  | Partridge
2  | Two  | Turtle Doves
3  | Three| French Hens

A LEFT JOIN will return a resultset with all records from the table on the left hand side of the join (if you were to write out the statement as a one liner, the table that appears first) and fields from the table on the right side of the join that match the join expression and are included in the SELECT clause. Missing details will be populated with NULL

SELECT 
    t1.id,
    t1.name,
    t2.name
FROM
    @Table1 t1
LEFT JOIN
    @Table2 t2
    ON 
        t1.id = t2.id

Results in

id | name | name
----------------
1  | One  | Partridge
2  | Two  | Turtle Doves
3  | Three| French Hens
4  | Four | NULL

A RIGHT JOIN is the same logic as a LEFT JOIN but will return all records from the right-hand side of the join and fields from the left side that match the join expression and are included in the SELECT clause.

SELECT 
    t1.id,
    t1.name,
    t2.name
FROM
    @Table1 t1
RIGHT JOIN
    @Table2 t2
    ON 
        t1.id = t2.id

Results in

id | name | name
----------------
1  | One  | Partridge
2  | Two  | Turtle Doves
3  | Three| French Hens
NULL| NULL| Gold Rings

Of course, there is also the FULL OUTER JOIN, which includes records from both joined tables and populates any missing details with NULL.

SELECT 
    t1.id,
    t1.name,
    t2.name
FROM
    @Table1 t1
FULL OUTER JOIN
    @Table2 t2
    ON 
        t1.id = t2.id

Results in

id | name | name
----------------
1  | One  | Partridge
2  | Two  | Turtle Doves
3  | Three| French Hens
4  | Four | NULL
NULL| NULL| Gold Rings

And a CROSS JOIN (also known as a CARTESIAN PRODUCT), which is simply the product of cross applying fields in the SELECT statement from one table with the fields in the SELECT statement from the other table. Notice that there is no join expression in a CROSS JOIN

SELECT 
    t1.id,
    t1.name,
    t2.name
FROM
    @Table1 t1
CROSS JOIN
    @Table2 t2

Results in

id | name  | name
------------------
1  | One   | Partridge
2  | Two   | Partridge
3  | Three | Partridge
4  | Four  | Partridge
1  | One   | Turtle Doves
2  | Two   | Turtle Doves
3  | Three | Turtle Doves
4  | Four  | Turtle Doves
1  | One   | French Hens
2  | Two   | French Hens
3  | Three | French Hens
4  | Four  | French Hens
1  | One   | Gold Rings
2  | Two   | Gold Rings
3  | Three | Gold Rings
4  | Four  | Gold Rings

EDIT:

Imagine there is now a Table3

@Table3

id | name
---------
2  | Prime 1
3  | Prime 2
5  | Prime 3

The SQL code

DECLARE @Table3 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))

INSERT INTO @Table3 VALUES(2, 'Prime 1');
INSERT INTO @Table3 VALUES(3, 'Prime 2');
INSERT INTO @Table3 VALUES(5, 'Prime 3');

Now all three tables joined with INNER JOINS

SELECT 
    t1.id,
    t1.name,
    t2.name,
    t3.name
FROM
    @Table1 t1
INNER JOIN
    @Table2 t2
    ON 
        t1.id = t2.id
INNER JOIN
    @Table3 t3
    ON 
        t1.id = t3.id

Results in

id | name | name         | name
-------------------------------
2  | Two  | Turtle Doves | Prime 1
3  | Three| French Hens  | Prime 2

It might help to understand this result by thinking that records with id 2 and 3 are the only ones common to all 3 tables and are also the field we are joining each table on.

Now all three with LEFT JOINS

SELECT 
    t1.id,
    t1.name,
    t2.name,
    t3.name
FROM
    @Table1 t1
LEFT JOIN
    @Table2 t2
    ON 
        t1.id = t2.id
LEFT JOIN
    @Table3 t3
    ON 
        t1.id = t3.id

Results in

id | name | name         | name
-------------------------------
1  | One  | Partridge    | NULL
2  | Two  | Turtle Doves | Prime 1
3  | Three| French Hens  | Prime 2
4  | Four | NULL         | NULL

Joel's answer is a good explanation for explaining this resultset (Table1 is the base/origin table).

Now with a INNER JOIN and a LEFT JOIN

SELECT 
    t1.id,
    t1.name,
    t2.name,
    t3.name
FROM
    @Table1 t1
INNER JOIN
    @Table2 t2
    ON 
        t1.id = t2.id
LEFT JOIN
    @Table3 t3
    ON 
        t1.id = t3.id

Results in

id | name | name         | name
-------------------------------
1  | One  | Partridge    | NULL
2  | Two  | Turtle Doves | Prime 1
3  | Three| French Hens  | Prime 2

Although we do not know the order in which the query optimiser will perform the operations, we will look at this query from top to bottom to understand the resultset. The INNER JOIN on ids between Table1 and Table2 will restrict the resultset to only those records satisfied by the join condition i.e. the three rows that we saw in the very first example. This temporary resultset will then be LEFT JOINed to Table3 on ids between Table1 and Tables; There are records in Table3 with id 2 and 3, but not id 1, so t3.name field will have details in for 2 and 3 but not 1.

Russ Cam
Although long, I like this answer because of the the detail that Russ went into to explain the table joins -- very little abstract content here, all of the tables with their data, the query, and the result are spelled out.
MedicineMan
I like the progression of the answer as well: take INNER and INNER case, and then show permutations and the results. It helps that his dataset was designed in a way that clearly illustrates the differences between the queries.
MedicineMan
Thank you - I've added additional detail to allow others to run the statements in SQL Server and see the results, as well as experiment.
Russ Cam
Thank you; great answer!
Abboq
A: 

For the sake of completeness and standard evangelics, I'll chime in with the ansi-92 nested join syntax:

select t1.*
    ,t2.*
    ,t3.*
from table1 t1
    left outer join (
        table2 t2 left outer join table3 t3 on (t2.b = t3.b)
    ) on (t1.a = t2.a)

Your SQL engine of choice may optimize for them.

maero
+1 if you can state what is different about your syntax than what has been written so far.. Is there some requirement of ANSI 92 that is different than above? What standard does the above use? I'm unfamiliar with ANSI 92. Finally, why would someone want to use ANSI 92 instead of the above?
MedicineMan
It's mostly a matter of preference. For readability and what you are most comfortable with.Depending on the complexity of your query, some SQL engines may optimize based on the order of operations, level of nest, etc.In the end, It's whatever the query optimizer says goes -- regardless of syntax.Additional info:http://stackoverflow.com/questions/1094585/sql-difference-between-these-joins
maero