views:

96

answers:

3

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).

I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.

Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:

create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
    from Accounting.TransactionParts p 
    group by p.[Transaction], p.Account

I then have a view to calculate the running balance column, as follows:

create view Accounting.TransactionBalanceView as
with cte as
(
    select ROW_NUMBER() over (order by t.[Date]) AS RowNumber, 
                         t.ID as [Transaction], p.Amount, p.Account
        from Accounting.Transactions t
            inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account, 
                coalesce(sum(a.Amount), 0) as Balance
    from cte a, cte b
    where a.RowNumber <= b.RowNumber AND a.Account=b.Account
    group by b.RowNumber, b.[Transaction], a.Account

For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running

select * from Accounting.TransactionBalanceView where [Transaction]=30

This gave me the following result:

RowNumber            Transaction Account Balance
-------------------- ----------- ------- ---------------------
72                   30          23      143.80

As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:

select [Transaction], count(*) 
    from Accounting.TransactionBalanceView 
    group by [Transaction] 
    having count(*) < 2

This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:

select [Transaction] 
    from Accounting.TransactionBalanceView 
    where [Transaction]=30

It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?

Thanks a lot if you've stuck with me this far!

Edit:

Here are the execution plans:

select *
select [Transaction]

1000 lines each, hence finding somewhere else to host.

Edit 2:

For completeness, here are the tables I used:

create table Accounting.Accounts
(
ID              smallint        identity primary key,
[Name]          varchar(50)     not null
    constraint UQ_AccountName unique,
[Type]          tinyint         not null
    constraint FK_AccountType foreign key references Accounting.AccountTypes
);

create table Accounting.Transactions
(
ID              int             identity primary key,
[Date]          date            not null default getdate(),
[Description]   varchar(50)     not null,
Reference       varchar(20)     not null default '',
Memo            varchar(1000)   not null
);

create table Accounting.TransactionParts
(
ID              int             identity primary key,
[Transaction]   int             not null
    constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account         smallint        not null
    constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount          money           not null,
VatRelated      bit             not null default 0
);
A: 

It seems you read dirty entries. (Someone else deletes/insertes new data)

try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.

i've tried this code (seems equal to yours)

IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)

INSERT #t 
SELECT 1, 2, 70 
UNION ALL SELECT 2, 3, 70

;with cte as
(
    select ROW_NUMBER() over (order by t.i) AS RowNumber, 
                         t.val as [Transaction], t.acc Account
        from #t t
)
select b.RowNumber, b.[Transaction], a.Account
    from cte a, cte b
    where a.RowNumber <= b.RowNumber AND a.Account=b.Account
    group by b.RowNumber, b.[Transaction], a.Account

and got two rows RowNumber Transaction Account 1 2 70 2 3 70

skaeff
This doesn't seem to fit the facts presented in the question IMO. `READ COMMITTED.` is the default. we have no reason to think he's not using it and this wouldn't explain why repeated executions got the same pattern.
Martin Smith
+4  A: 

Demonstration of possible explanation.

Create table Script

SELECT *
INTO #T
 FROM master.dbo.spt_values

 CREATE NONCLUSTERED INDEX [IX_T] ON  #T ([name] DESC,[number] DESC);

Query one (Returns 35 results)

WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
 FROM  #T
)
SELECT c1.number,c1.[type] 
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number

Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results) ;

WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
 FROM  #T
)
SELECT c1.number,c1.[type] ,c2.[type] 
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number

Why?

row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.

Execution Plan

Suggested Solution

You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])

Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.

What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)

Martin Smith
Ah. I thought that the whole point of a CTE was to create a temporary result set which would be stored for the duration of the query. I think you might be on to something.
gordonml
I've changed the `ROW_NUMBER()` to `ROW_NUMBER() over (order by t.[Date], p.[Transaction], p.Account` and that seems to solve my issue. I think as you say it fully specifies the order in which to number the rows, which ensures they are always given the same row number. As it happens, Transaction 30 was out of sequence -- its date was not in the same order as its transaction id. Thinking about the data and how it's queried, I'm not sure why the row_number being different would cause rows to disappear, but it's been a long week and my brain hurts... Thanks again for your help folks.
gordonml
+1  A: 

If the purpose of this part of the view is just to make sure that the same row isn't joined to itself

where a.RowNumber <= b.RowNumber

then how does changing this part to

where a.RowNumber <> b.RowNumber

affect the results?

TooFat
Good question. It's possible that a.RowNumber must always be lower than b.RowNumber, but I'd be interested to see how the results change if he makes that modification...
AllenG
@AllenG - The point is that there is no reason to assume that there is a guaranteed repeatable correlation between `a.RowNumber` and `b.RowNumber` anyway. See my answer for a demonstration of this.
Martin Smith
@Martin Smith: true, but since we don't know exactly how Transactions are inserted into the database, we don't have any reason to assume that there isn't, either. If, for instance, a.RowNumber always indicates the debit transaction (removing money from one account) and the b.RowNumber indicates the credit (adding money to the 2nd account), it is quite possible that the code would force the debit before the credit.
AllenG
@AllenG It is doing a comparison on `a.RowNumber <= b.RowNumber`. So I don't know why this would be assumed to be to be for the purposes of avoiding the row joining onto itself. I would have thought the expectation of the query writer would be quite the opposite or they would have used `<`. The `RowNumber` is calculated dynamically in the query based on the date order only and is in no way insulated from duplicate dates belonging to different accounts/transactions. If there isn't a duplicate date in the data matching that of transactionid 30's date I'll eat my hat!
Martin Smith
The purpose is to sum all rows up to and including the current one, in order to retrieve the balance for the current row.@Martin Smith - There's no other transaction with the same date. To complete the double entry there's another `TransactionPart` though (the items that are being ranked), which shouldn't affect it because its on a seperate account. I'd find a tasty hat just in case...
gordonml
This is where I buy my hats http://www.hatsofmeat.com/ so I'm not worried! Have you run `Select ROW_NUMBER() over (order by t.[Date]) AS RowNumber, t.ID as [Transaction], p.Amount, p.Account, t.[Date] from Accounting.Transactions t inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID` with no filters and looked at the `Row_Number()` and dates?
Martin Smith