ansaurus

Question

SQL Server: row present in one query, missing in another

Answer 1

A:

It seems you read dirty entries. (Someone else deletes/insertes new data)

try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.

i've tried this code (seems equal to yours)

IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)

INSERT #t 
SELECT 1, 2, 70 
UNION ALL SELECT 2, 3, 70

;with cte as
(
    select ROW_NUMBER() over (order by t.i) AS RowNumber, 
                         t.val as [Transaction], t.acc Account
        from #t t
)
select b.RowNumber, b.[Transaction], a.Account
    from cte a, cte b
    where a.RowNumber <= b.RowNumber AND a.Account=b.Account
    group by b.RowNumber, b.[Transaction], a.Account

and got two rows RowNumber Transaction Account 1 2 70 2 3 70

skaeff 2010-07-29 12:02:43

This doesn't seem to fit the facts presented in the question IMO. `READ COMMITTED.` is the default. we have no reason to think he's not using it and this wouldn't explain why repeated executions got the same pattern.

Martin Smith 2010-07-29 12:06:13

Answer 2

+4 A:

Demonstration of possible explanation.

Create table Script

SELECT *
INTO #T
 FROM master.dbo.spt_values

 CREATE NONCLUSTERED INDEX [IX_T] ON  #T ([name] DESC,[number] DESC);

Query one (Returns 35 results)

WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
 FROM  #T
)
SELECT c1.number,c1.[type] 
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number

Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results) ;

WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
 FROM  #T
)
SELECT c1.number,c1.[type] ,c2.[type] 
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number

Why?

row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.

Execution Plan

Suggested Solution

You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])

Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.

What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)

Martin Smith 2010-07-29 12:15:38

Ah. I thought that the whole point of a CTE was to create a temporary result set which would be stored for the duration of the query. I think you might be on to something.

gordonml 2010-07-29 19:15:24

I've changed the `ROW_NUMBER()` to `ROW_NUMBER() over (order by t.[Date], p.[Transaction], p.Account` and that seems to solve my issue. I think as you say it fully specifies the order in which to number the rows, which ensures they are always given the same row number. As it happens, Transaction 30 was out of sequence -- its date was not in the same order as its transaction id. Thinking about the data and how it's queried, I'm not sure why the row_number being different would cause rows to disappear, but it's been a long week and my brain hurts... Thanks again for your help folks.

gordonml 2010-07-29 21:44:26

Answer 3

+1 A:

If the purpose of this part of the view is just to make sure that the same row isn't joined to itself

where a.RowNumber <= b.RowNumber

then how does changing this part to

where a.RowNumber <> b.RowNumber

affect the results?

TooFat 2010-07-29 13:31:43

Good question. It's possible that a.RowNumber must always be lower than b.RowNumber, but I'd be interested to see how the results change if he makes that modification...

AllenG 2010-07-29 14:20:01

@AllenG - The point is that there is no reason to assume that there is a guaranteed repeatable correlation between `a.RowNumber` and `b.RowNumber` anyway. See my answer for a demonstration of this.

Martin Smith 2010-07-29 15:38:40

@Martin Smith: true, but since we don't know exactly how Transactions are inserted into the database, we don't have any reason to assume that there isn't, either. If, for instance, a.RowNumber always indicates the debit transaction (removing money from one account) and the b.RowNumber indicates the credit (adding money to the 2nd account), it is quite possible that the code would force the debit before the credit.

AllenG 2010-07-29 15:52:49

@AllenG It is doing a comparison on `a.RowNumber <= b.RowNumber`. So I don't know why this would be assumed to be to be for the purposes of avoiding the row joining onto itself. I would have thought the expectation of the query writer would be quite the opposite or they would have used `<`. The `RowNumber` is calculated dynamically in the query based on the date order only and is in no way insulated from duplicate dates belonging to different accounts/transactions. If there isn't a duplicate date in the data matching that of transactionid 30's date I'll eat my hat!

Martin Smith 2010-07-29 16:49:59

The purpose is to sum all rows up to and including the current one, in order to retrieve the balance for the current row.@Martin Smith - There's no other transaction with the same date. To complete the double entry there's another `TransactionPart` though (the items that are being ranked), which shouldn't affect it because its on a seperate account. I'd find a tasty hat just in case...

gordonml 2010-07-29 19:21:15

This is where I buy my hats http://www.hatsofmeat.com/ so I'm not worried! Have you run `Select ROW_NUMBER() over (order by t.[Date]) AS RowNumber, t.ID as [Transaction], p.Amount, p.Account, t.[Date] from Accounting.Transactions t inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID` with no filters and looked at the `Row_Number()` and dates?

Martin Smith 2010-07-29 19:34:54

ansaurus

tags:

views:

answers:

SQL Server: row present in one query, missing in another

Edit:

Edit 2:

related questions