views:

180

answers:

1

We have a C# system throwing up an oddity we can't get to the bottom of with SQL (SQL2k5).

The situation is that we have two seperate processes running simultaneously looking at the same table, both running inside their own transaction in two different serviced COM+ components on two different clusters. Both are talking to Order and OrderItem SQL tables. Both are running in READ COMMITTED isolation level.

Job 1: Loading an order and one or more items into these tables.

Job 2: A scheduled task that occurs every few minutes selecting from said tables finding all orders that have been fully processed.

The problem: What we have found is that four times in the past eight months job 2 has reacted to a partially loaded order from job 1, before job 1 has fully completed (as an item on the order has been no stocked, it's at a finished status, so is seen by the job as the rest of the order is yet to load).

From investigation, whilst job 1 is in progress we are unable to do a:

select * from order

once an order has been inserted into it. However, we can do:

select * fromOrder o inner join OrderItem oi on oi.orderid = o.id

at the same time (why is this?).

On investigation replicating the same circumstances we can't get job 2 to see a partially created order from job 1, it does return some rows but never the order being inserted by job 1. However, this is exactly what is happening in our live environment!

Does anyone have an insight into why this is occurring?

A: 

I would try rolling back some work performed by job 1 as an experiment to confirm that everything you think is included in the transaction is actually removed when the rollback occurs. I, like you, would have suspected that you will never be able to read data, from a different transaction, that would be removed by a rollback when using READ COMMITTED. The fact that you are able to read rows suggests some of them might be going in outside the context of the transaction.

My next thought is to review the meaning of READ COMMITTED. This isolation level supposedly ensures that the data you're reading has been committed by one transaction, but doesn't necessarily ensure that it still can't change as a result of another. Contrast this with REPEATABLE READ where all data remains locked until the end of the transaction, or SERIALIZABLE, which tries to behave as if all transactions were executed in sequence. I suspect that if you change your locking to SERIALIZABLE (the most restrictive) this particular problem would not occur, but that may be overkill.

After considering the meaning of READ COMMITTED compared with other isolation levels, I suspect that the behavior you're seeing is natural in this isolation level because the rows you have inserted will in fact be the rows that will still be there at the end of the transaction. I think READ COMMITTED is not intended to ensure that the query is returning a complete set of rows (which would require a range lock as imposed by SERIALIZABLE), but only that the individual rows you're reading have been completely inserted (so you're not reading a partial row). I could be wrong, but given the behavior you describe, this is my suspicion. If you need to ensure the whole set of rows is complete, I think you need SERIALIZABLE isolation.

Which transaction needs to be SERIALIZABLE? Obviously if they're both SERIALIZABLE, that should work. But it may be possible to make only one of them SERIALIZABLE and still get the correct behavior. My mental powers at the moment are not up to the task of determining what really needs to be SERIALIZABLE, but perhaps some comments by others will shed additional light.

BlueMonkMN