views:

170

answers:

4

Suppose I have these two tables:

Invoice
-------
iInvoiceID int PK not null
dtCompleted datetime null

InvoiceItem
-----------
iInvoiceItemID int PK not null
iInvoiceID int FK (Invoice.iInvoiceID) not null
dtCompleted datetime null

Each InvoiceItem might be fulfilled by a different process (executable) that runs on a different machine. When the process is complete, I want it to call a stored procedure to stamp the InvoiceItem.dtCompleted field, and I want this stored procedure to return back a flag indicating whether the entire invoice has been completed. Whichever process happens to be the one that finished the invoice is going to kick off another process to do some final business logic on the invoice, e.g. stamp the dtCompleted and send a receipt email. Obviously I want this other process to fire only once for a given Invoice.

Here is my naive implementation:

CREATE PROCEDURE dbo.spuCompleteInvoiceItem
    @iInvoiceItemID INT 
AS
BEGIN
    BEGIN TRAN

        UPDATE InvoiceItem 
        SET dtCompleted = GETDATE()
        WHERE iInvoiceItemID = @iInvoiceItemID

        IF EXISTS(SELECT * FROM InvoiceItem WHERE dtCompleted IS NULL 
                  AND iInvoiceID = (SELECT iInvoiceID FROM InvoiceItem
                                   WHERE iInvoiceItemID=@iInvoiceItemID))
            SELECT 'NotComplete' AS OverallInvoice
        ELSE
            SELECT 'Complete' AS OverallInvoice
    COMMIT
END

Is this sufficient? Or do I need to increase the transaction serialization level and if so, what level would provide the best balance of performance and safety?

Pre-emptive comments:

  • I know I could achieve the same business goal by implementing a central concurrency service at the process/executable level, but I think that's overkill. My instinct is that if I craft my stored procedure and transaction well, I can use SQL Server as my inter-process concurrency service for this simple operation without heavily impacting performance or increasing deadlock frequency (have my cake and eat it too.)
  • I'm not worrying about error handling in this example. I'll add the proper TRY/CATCH/ROLLBACK/RAISERROR stuff after.

Update 1:

According to the experts, not only do I need the most restrictive transaction isolation level -- serializable -- but I also need to lock all the InvoiceItems of a particular invoice before I do anything else, to ensure that other concurrent calls to the stored procedure will block until the current one completes. Otherwise I might get deadlocks. Here's my latest version of the implementation:

CREATE PROCEDURE dbo.spuCompleteInvoiceItem
        @iInvoiceItemID INT 
    AS
    BEGIN
        IF @iInvoiceItemID IS NULL RAISERROR('@iInvoiceItemID cannot be null.', 16, 1)

        BEGIN TRAN

            SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

            DECLARE @iInvoiceID INT

            SELECT @iInvoiceID = iInvoiceID
            FROM InvoiceItem
            WHERE dtCompleted IS NULL 
            AND iInvoiceID = (SELECT iInvoiceID FROM InvoiceItem WHERE iInvoiceItemID=@iInvoiceItemID)

            IF @iInvoiceID IS NULL
            BEGIN
                -- Should never happen
                SELECT 'AlreadyComplete' AS Result
            END
            ELSE
            BEGIN
                UPDATE InvoiceItem SET dtCompleted = GETDATE() WHERE iInvoiceItemID = @iInvoiceItemID

                IF EXISTS(SELECT * FROM InvoiceItem WHERE iInvoiceID=@iInvoiceID AND dtCompleted IS NULL)
                    SELECT 'NotComplete' AS Result
                ELSE
                    SELECT 'Complete' AS Result
            END

        COMMIT

Thanks,

Jordan Rieger

A: 

Your problem here is if two processes simultaniously complete the remaining two items they are both going to think they finished the invoice.

What you want is a lock on all the other detail items in the invoice to ensure nothing else can change the status whilst process is updating the status. This will of course reduce concurency but this should be limited to only the current invoice and should be fairly short.

You can do this using the SERIALIZABLE isolation level to ensure that you cannot get phantom reads,


  DECLARE @iInvoiceId int

  BEGIN TRAN

  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

  SET NOCOUNT ON

  -- This select locks the rows and ensures that repeated
  -- selects will produce the same result
  -- ie no other transaction can affect these rows,
  -- or insert a row into this invoice
  SELECT @iInvoiceId = iInvoiceId FROM InvoiceItem WITH (xlock)
  WHERE iInvoiceId = (
    SELECT iInvoiceId FROM InvoiceItem WHERE iInvoiceItemId = @iInvoiceItemId)

  SET NOCOUNT OFF
  -- perform request of query as before

  COMMIT
KeeperOfTheSoul
Thanks, you understood my question, and the problem I am trying to avoid, perfectly.It seems ugly to do a SELECT before my UPDATE just to guarantee a bunch of row locks. Isn't there a table hint that I could provide in my UPDATE that would force it to be atomically linked to the following SELECT statement, e.g. "WITH (TABLOCKX)"? I imagine I would reduce concurrency if I lock the whole table, even though it's a very short operation. Maybe the SELECT is not so ugly after all...Also, can you explain the purpose of the TOP 1 clause and READPAST hint here? Don't we want to lock all rows?
The use of top 1 means it doesn't enumerate every row, but readpast allows it to apply locks to the rows that would have been queried if it were not for the top statement
KeeperOfTheSoul
I've altered the query slightly, turns out (readpast) cannot be used with SERIALIZABLE so I've removed the TOP 1 and changed the lock hint to XLOCK
KeeperOfTheSoul
A: 

You have two alternatives:

  1. Stateful with short transactions. Mark the status of invoices being processed. The job picks an invoice to be processed and updates its status to 'processing' (pick-update atomically), then commit. It processes the the invoice, then comes back and updates the status as 'complete'. There cannot be other job processing the same invoice because the invoice is marked 'processing'. This is the typical queued based workflow.

  2. Stateless with long transactions. Lookup for an invoice to process and lock it (UPDLOCK). In practice this is done by doing the complete update at the begining of transaction, thus locking the invoice in X mode. Keep the transaction open while the invoice is processed. At the end, mark it as complete and comit.

There is nothing transaction isolation levels can do to help you here. They only affact the duartion and scope of S-locks and S-locks have no way of preventing two jobs from attempting to process the same invoince, leading to blocking and deadlocks.

If the 'processing' is of any length, then you must use the statefull short transactions, since holding long transaction locks in the database will kill every other activity. The drawback is that jobs can crash w/o completing the processing and leave invoices in abandoned 'processing' state. Usually this is resolved by a 'garbage collecting' job that resets the status back to 'available' if they don't complete in alloted time.

Update

K. Then the EXISTS query should have a WHERE clause with the InvoiceID, shouldn't it? As it is now, it would return 'Complete' when all invoice items, from all invoices, have been stamped with a complete date.

Anyway, that final check for complete is a guaranteed deadlock, on any isolation level: T1 updates item N-1 and selects the EXISTS. T2 updates item N and selects the EXISTS. T1 blocks on the T2 update, T2 blocks on T1 update, deadlock. No isolation level can help there and is an extremly likely scenario. No isolation level can prevent this, becuase the cause of the deadlock is the pre-existing update, not the SELECT. Ultimately the problem is caused by the fact that parallel processors dequeue correlated items. As long as you allow this to happen, deadlocks are going to be an everyday (or even everysecond...) fact of life with your processing. I know this because, as a developer with SQL Server in Redmond, I've spent the better part of the past 10 years in this problem space. This is why Service Broker (the built-in queues of SQL Server) do Conversation Group Locking: to isolate correlated messages processing. Unless you ensure that items from one invoce are only processed by one job, you will spend the rest of your days solving new deadlock scenarios in item processing. The best you can do is come up with a very restrictive locking that block the entire invoice upfront, but that in effect is exactly what I'm telling you to do (blockk access to corellated items).

Remus Rusanu
Thanks, but this is mostly beyond my question scope. For my question, I'm only concerned with guaranteeing that my stored procedure will only return 'Complete' once for a given invoice. For that part, transaction isolation level definitely has an impact. The job that processes completed invoices is already implemented as a queue.
I added an update after reading your comments
Remus Rusanu
I also noticed the problem with the WHERE clause and corrected it. Let me read the rest of your comment and try to process it.
A: 

In my view any need for serializable/repeatable reads must unconditionally be avoided. They are a result of incorrect thinking about the nature of concurrency in RDBMS and severly limit scalability. On many platforms including Oracle such devices simply don't exist.

My recommendation is to check the condition in the update statement.

UPDATE InvoiceItem  
        SET dtCompleted = GETDATE() 
        WHERE iInvoiceItemID = @iInvoiceItemID 
        **AND dtCompleted IS NULL**

IF (@@ROWCOUNT = 1)
...win

IF (@@ROWCOUNT = 0)
...loose

If rowcount is 1 then you know your update succeeded and you can proceed with whatever postprocessing you need to do.

If rowcount is 0 then you know there is either a problem (Invoice Item does not exist) or more likely something else beat your process to it and set dtCompleted so you should not continue.

If all processing is done in the same transaction I recommend staking your claim up front. If its a separate process just need to be very careful about demarcations between transactions to maintain consistancy.

Einstein
A: 

@Einstein,

I think most people would agree that unconditionally avoiding serializable/repeatable reads is too extreme. Isn't it reasonable to leverage the concurrency features of the RDBMS in specific, limited circumstances, when a specialized concurrency system is not warranted, and after careful consideration of the side effects? I've learned to be wary of absolute rules in software development -- I always seem to be encounter cases where it's reasonable to break them :-)

Anyways, the UPDATE statement you provided doesn't quite work. Consider an Invoice with two InvoiceItems processed in the following sequence:

  • process A completes InvoiceItem A
  • process B completes InvoiceItem B
  • process A checks ROWCOUNT: 1
  • process B checks ROWCOUNT: 1
  • process A believes it is the one that completed the Invoice
  • process B believes it is the one that completed the Invoice

If it's not clear from my original question, you can assume I already have a system that ensures that each InvoiceItems is only processed once. All I'm trying to do is ensure that the process that completes the last InvoiceItem can safely kick off a post-processing function for the entire Invoice, and that this function is only executed once per invoice.