ansaurus

Question

Atomically mark and return a group of rows in database

Answer 1

+7 A:

Please see my answer here: SQL Server Process Queue Race Condition which also manages 20 rows in one go.

Basically, it's quite simple in SQL Server to manage concurrency and polling using the hints ROWLOCK, READPAST and UPDLOCK.

I can't comment about Linq, but a transaction still leaves you open to concurrency issues: you need to use the hints I mentioned

gbn 2009-11-30 17:23:35

Answer 2

+1 A:

I know it's off topic but for this you could use MSMQ. A message queue would put your jobs in sequence and it's thread safe. You can also assign priority whish MSMQ manage itself. You can use read or peek to delete a message from the queue or just see what there. You can use the command design pattern to help you with this.

Stéphane 2009-11-30 17:31:00

Queueing is the answer, but why MSMQ when SQL Server comes with built in queues?

Remus Rusanu 2009-11-30 17:43:31

The manner I use them is to control processes. When I queue something I don't use the database at all. So any listner can get a job to do. And I tested it with 5 computers running 10 process each and I never got a concurrency problem. I guess it depends where you want you queue to reside.

Stéphane 2009-11-30 21:13:18

Answer 3

A:

Is it not just as simple as running your T-SQL within a transaction, or am I missing something?

birdus 2009-11-30 17:36:19

Answer 4

+4 A:

Building on gbn's answer...

If you're using SQL Server 2005 or newer, you can return the updated rows atomically by using an OUTPUT clause in your UPDATE statement:

UPDATE TOP (20) your_table
SET status = 'processing'
OUTPUT INSERTED.*
FROM your_table WITH (ROWLOCK, READPAST, UPDLOCK)
WHERE status = 'new'

LukeH 2009-11-30 17:49:24

Answer 5

+6 A:

Your table of jobs is a queue. Writing user tables backed up queues is a notoriously error prone as it leads to deadlocks and concurency issues.

The simplest thing would be to drop the user table and use a true queue instead. This will give you deadlock free concurency free queue on system tested and validated code base. The problem is that the whole paradigm around queues changes from INSERT and DELETE/UPDATE to SEND/RECEIVE. On the other hand with built-in queue you get some very powerfull free goodies, namely Activation and correlated items locking.

If you want to continue down the path of user table backed queues then the second most important trick in writing user tables queues is to use UPDATE ... OUTPUT:

WITH cte AS (
  SELECT TOP(20) status, id, ...
  FROM table WITH (ROWLOCK, READPAST, UPDLOCK)
  WHERE status = 'new'
  ORDER BY enqueue_time)
UPDATE cte
  SET status = 'processing'
OUTPUT
  INSERTED.id, ...

The CTE syntax is just for convenience of placing the TOP and ORDER BY properly, the query can be written using derived tables just as esily. You cannot use straight UPDATE ... TOP because UPDATE does not support an ORDER BY and you require this to satisfy the 'oldest' part of your requirement. The lock hints are needed to facilitate high concurency between parallel processing threads.

I said this is the second most important trick. The most important is how you organize the table. For a queue it must be clustered by (status, enqueue_time). If you don't organize the table properly you'll end up with deadlocks. Pre-emptive comment: fragmentation is irelevant in this scenario.

Remus Rusanu 2009-11-30 17:56:39

ansaurus

tags:

views:

answers:

Atomically mark and return a group of rows in database

related questions