I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]
Take 3 tables A, B [A related to be by some A.AId], and C [B related to C by some B.BId]
If I compose a query e.g
SELECT *
FROM A JOIN B
ON A.AId = B.AId
All good - I'm sweet with how this works.
What happens when Table C (Or some other D,E, .... gets added)
In the situation
SELECT *
FROM A JOIN B
ON A.AId = B.AId
JOIN C ON C.BId = B.BId
What is C joining to? - is it that B table (and the values in the B table?) Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?
[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]
A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:
Tables
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)
Where:
Account->Opening, and Closing Balances are NULLABLE
(may have opening balance, closing balance, or none)
Balance->BalanceToken is 1:m - a balance could consist of many tokens
Conceptually, Closing Balance of a date, would be tomorrows opening balance
If I was trying to find a list of all the opening and closing balances for an account
I might do something like
SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A
LEFT JOIN BALANCE OpeningBal
ON A.OpeningBalanceId = OpeningBal.BalanceId
LEFT JOIN BALANCE ClosingBal
ON A.ClosingBalanceId = ClosingBal.BalanceId
LEFT JOIN BalanceToken openingBalanceAmounts
ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
LEFT JOIN BalanceToken closingBalanceAmounts
ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
GROUP BY AccountId, AccountBalanceDate
Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.
[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]
I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.
I know that the example tables do not conform to good DB design
Apologies for the essay, thanks for any elightenment :)
Edit in response to question by Marc
Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.