views:

132

answers:

4

Hello:

I am working on a work queueing solution. I want to query a given row in the database, where a status column has a specific value, modify that value and return the row, and I want to do it atomically, so that no other query will see it:


begin transaction
select * from table where pk = x and status = y
update table set status = z where pk = x
commit transaction
--(the row would be returned)

it must be impossible for 2 or more concurrent queries to return the row (one query execution would see the row while its status = y) -- sort of like an interlocked CompareAndExchange operation.

I know the code above runs (for SQL server), but will the swap always be atomic?

I need a solution that will work for SQL Server and Oracle

A: 

I have some applications that follow a similar pattern. There is a table like yours that represents a queue of work. The table has two extra columns: thread_id and thread_date. When the app asks for work froom the queue, it submits a thread id. Then a single update statement updates all applicable rows with the thread id column with the submitted id and the thread date column with the current time. After that update, it selects all rows with that thread id. This way you dont need to declare an explicit transaction. The "locking" occurs in the initial update.

The thread_date column is used to ensure that you do not end up with orphaned work items. What happens if items are pulled from the queue and then your app crashes? You have to have the ability to try those work items again. So you might grab all items off the queue that have not been marked completed but have been assigned to a thread with a thread date in the distant past. Its up to you to define "distant."

Matt Wrock
+1  A: 

As a general rule, to make an operation like this atomic you'll need to ensure that you set an exclusive (or update) lock when you perform the select so that no other transaction can read the row before your update.

The typical syntax for this is something like:

 select * from table where pk = x and status = y for update

but you'd need to look it up to be sure.

Adrian Pronk
Note, doing this doesn't stop another session reading that row, UNLESS that other session is also using the "for update" clause.
Matthew Watson
+1  A: 

Try this. The validation is in the UPDATE statement.

Code

IF EXISTS (SELECT * FROM sys.tables WHERE name = 't1')
    DROP TABLE dbo.t1
GO
CREATE TABLE dbo.t1 (
    ColID       int         IDENTITY,
    [Status]    varchar(20)
)
GO

DECLARE @id             int
DECLARE @initialValue   varchar(20)
DECLARE @newValue       varchar(20)

SET @initialValue = 'Initial Value'

INSERT INTO dbo.t1 (Status) VALUES (@initialValue)
SELECT @id = SCOPE_IDENTITY()

SET @newValue = 'Updated Value'

BEGIN TRAN

UPDATE dbo.t1
SET
    @initialValue = [Status],
    [Status]      = @newValue
WHERE ColID    = @id
  AND [Status] = @initialValue

SELECT ColID, [Status] FROM dbo.t1

COMMIT TRAN

SELECT @initialValue AS '@initialValue', @newValue AS '@newValue'

Results

ColID Status
----- -------------
    1 Updated Value

@initialValue @newValue
------------- -------------
Initial Value Updated Value
Rob Garrison
+3  A: 

Is PK the primary key? Then this is a non issue, if you already know the primary key there is no sport. If pk is the primary key, then this begs the obvious question how do you know the pk of the item to dequeue...

The problem is if you don't know the primary key and want to dequeue the next 'available' (ie. status = y) and mark it as dequeued (delete it or set status = z).

The proper way to do this is to use a single statement. Unfortunately the syntax differs between Oracle and SQL Server. The SQL Server syntax is:

update top (1) [<table>]
set status = z 
output DELETED.*
where  status = y;

I'm not familiar enough with Oracle's RETURNING clause to give an example similar to SQL's OUTPUT one.

Other SQL Server solutions require lock hints on the SELECT (with UPDLOCK) to be correct. In Oracle the preffered avenue is use the FOR UPDATE, but that does not work in SQL Server since FOR UPDATE is to be used in conjunction with cursors in SQL.

In any case, the behavior you have in the original post is incorrect. Multiple sessions can all select the same row(s) and even all update it, returning the same dequeued item(s) to multiple readers.

Remus Rusanu
Kudos. OUTPUT is a great way to do exactly what he's looking for in a single command.
Rob Garrison
@Remus: I will have a look at this. We often do know the pk -- I don't rely upon polling as my online notification mechanism that an item has been submitted to the queue. The service reading the queue recieves a remoted message. The RPC notification includes the pk, but not the payload, as the payload may be large. However, if the service should go down, it has to recover from the table. I'm trying to eliminate a potential race condition, where a recovering service may load a new entry from the table, and recieve the remoted notification.
JMarsch
@JMarsh: I see. If I understand correctly most access will be done via the PK, but every so often (during a service recovery) an access will be done based on some other criteria (and implicitly other index order). This is very tricky to get right, you may encounter a deadlock because of index access order, see http://rusanu.com/2009/05/16/readwrite-deadlock/ (on SQL Server at least).
Remus Rusanu
@Remus: That is a very interesting article! I think that want to avoid using the database to control the concurrency here. I'm considering an algorithm in the middle tier that will not require CAS in the db. I'm very strongly considering including the payload in the RPC notification, and thus avoiding the race condition altogether (perhaps leveraging MSMQ to deliver the notification and payload).
JMarsch
Make sure you don't throw the baby out with the bath water. Whatever concurency issues exist in the DB, they will likely exist, in one for or another, in the mid tier. Who is delivering your notifications? Perhaps rather than moving more to the mid tier, you can move more to the db (ie. use SQL's own messaging http://msdn.microsoft.com/en-us/library/ms166104.aspx).
Remus Rusanu
I would like to be able to use SQL's messaging, but I have to support an Oracle back-end as well. Agreed that if I move it to the middle tier I still have to handle the concurrency issue, but I have a code model that I believe would work. The comment section on this page doesn't afford me enough space to effectively describe it. If it works out, I'll post it back to this thread as a full response. The notifications will be delivered via .net/WCF (Windows Communication Foundation). Alternatively, I might use MSMQ to deliver the notifications.
JMarsch
You can use WCF with a queued channel backed by MSMQ, is a slightly easier programming model than pure MSMQ.
Remus Rusanu
I think that's the direction I'm going with this. I'm going to go ahead and mark this entry as the answer. Thanks for talking through this with me, Remus.
JMarsch