ansaurus

Question

Understanding how JOIN works when 3 or more tables are involved. [SQL]

Answer 1

+2 A:

We know that the data from B is going to be filtered by the (inner) join to A (the data in A is also filtered). So if we (inner) join from B to C, thus the set C is also filtered by the relationship to A. And note also that any duplicates from the join will be included.

However; what order this happens in is up to the optimizer; it could decide to do the B/C join first then introduce A, or any other sequence (probably based on the estimated number of rows from each join and the appropriate indexes).

HOWEVER; in your later example you use a LEFT OUTER join; so Account is not filtered at all, and may well my duplicated if any of the other tables have multiple matches.

Are there duplicates (per account) in BalanceToken?

Marc Gravell 2009-07-05 08:30:14

Hi Marc - thanks for your response, I have edited the initial question with information in response to your question

Delaney 2009-07-05 08:52:43

Are you sure on that part, where say the optimizer would decide which join he does first? I think the second join uses the results from the first and so, the first join has to be execute before second, right?

Tim Büthe 2009-07-05 09:12:45

The final result needs to respect the semantics of the query - but the optimizer can start pretty much anywhere (the results must be the same, though). For example, if we are inner-joining a table with 1M rows to a table with 10 rows, it probably makes sense to star with the 10. Obviously this is different for any kind of outer join - but the optimizer still has lots of options **as long as the result is the same**.

Marc Gravell 2009-07-05 09:36:54

Answer 2

+1 A:

I often find it helps to view the actual execution plan. In query analyser/management studio, you can turn this on for queries from the Query menu, or use Ctrl+M. After running the query, the plan that was executed is shown in another result tab. From this you'll see that C and B are joined first, and then the result is joined with A. The plan might vary depending on information the DBMS has because both joins are inner, making it A-and-B-and-C. What I mean is that the result will be the same regardless of which is joined first, but the time it takes might differ greatly, and this is where the optimiser and hints come into play.

Bernhard Hofmann 2009-07-05 09:04:15

Answer 3

+3 A:

Conceptually here is what happens when you join three tables together.

The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
The query execution engine applies any predicates (WHERE clause) to the first table that do not invovle any of the other tables. It selects out the columns mentioned in the join conditions or the select list or the order by list. Call this result A
It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
Then it joins in the final table and applies the ORDER BY

This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.

For example, there is really no need to generate the full result sets along the way. The ORDER BY may instead by done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.

WW 2009-07-05 09:23:06

Thanks WW - I think that clears up what I was wondering - subsequent joins (table 3 onwards) happen against / to the 'last' interim result set that is being built along the way - not the original tables.So saying from the first example, ... B JOIN C ON B.BId = C.BId - joins agains rows that are in the A+B intermediate resultset NOT against the rows that are in the B table.[conceptually]

Delaney 2009-07-05 10:06:17

Yes, only two result sets are ever joined together at once and it builds up. This is why the join order is very important to the plan.

WW 2009-07-05 10:35:44

Answer 4

+1 A:

Joins can be tricky, and much of the behavior is of course dictated by how the data is stored in the actual tables.

Without seeing the tables it's hard to give a clear answer in your particular case but I think the basic issue is that you are summing over multiple result sets that are being combined into one.

Perhaps instead of multiple joins you should make two separate temporary tables in your query, one with the accountID, date and sum of openingbalances, a second one with the accountID, date and sum of closing balances, then joining those two on AccountID and date.

In order to find out exactly what is happening with joins, also in your specific case, I would do the following:

Change the initial part

SELECT accountID Accountbalancedate, sum(...) as openingbalance, sum(...) as closingbalance FROM

to simply

"SELECT * FROM"

Study the resulting table, and you will see exactly what data is being duplicated. Remove the joins one by one and see what happens. This should give you a clue to what it is about your particular data that is causing the dupes.

If you open the query in SQL server management studio (Free version exists) you can edit the query in the designer. The visual view of how the tables are being joined might also help you realize what's going on.

Console 2009-07-05 10:21:27

ansaurus

tags:

views:

answers:

Understanding how JOIN works when 3 or more tables are involved. [SQL]

related questions