views:

115

answers:

3

I have a database table containing a queue of work items processed concurrently by multiple readers, with these requirements:

  • Each item should be processed by one reader only.
  • If a reader fails for any reason, the item should be returned to the queue for another reader to process.

Here is the pseudo code of a possible solution, each reader would do the following:


1) Read next item from queue and store it locally somewhere.

2) START TRANSACTION

3) Delete the item, to prevent other readers from seeing it. If deletion fails, it means another worker already pulled the item so go back to step 1.

4) Process the item (stored locally in step 1). This could be long running.

5) COMMIT TRANSACTION : item deletion is committed, go to step 1 to process next item

OR

6) ROLLBACK TRANSACTION (explicit rollback or reader failure): item deletion is rolled back and returns back to queue for another reader


My question is: what is the lowest isolation level I need to ensure that after step 3 (item deletion), other readers cannot see it? By lowest, I mean I want to maximize concurrency while maintaining integrity.

Note I'm using SQL Server 2005, if it matters (I think this should be product agnostic, right?)

Any other feedback on this approach in general is welcome.

+1  A: 

An alternative method for handling this that I've used is:

  1. Each work item has a column for State and another for ReaderID. State can be 'N'ew, 'A'ctive, or 'C'omplete. (I also use 'E'rror.)

  2. Readers each have their own ID.

  3. A Reader's first action is to UPDATE the first queued item with state 'N' with a single SQL statement that sets the state to 'A' and the ReaderID for itself.

  4. The Reader then processes the work item, and when done, sets the state to 'C'.

For your purposes, you avoid issues with isolation by having the checkout action take place in one statement.

The SQL would be something like:

UPDATE queue
SET State = 'A', ReaderID = @myWorkerID
WHERE queueid = (SELECT MIN(queueid) FROM queue WHERE state = 'N')

le dorfier
This is flawed using READ COMMITTED (the default)
Mitch Wheat
I think we've had this conversation. I still hold it's not flawed. Have you tested it? I have.
le dorfier
@le dorfier: I have voted down for now but will talk to some SQL experts, and get back to you...
Mitch Wheat
...instead I have removed downvote, will ask the question and then get back to you. I firmly believe you are wrong, but I'm willing to learn...
Mitch Wheat
I am too. The only thing worse than not knowing is knowing what's wrong, and I'm as susceptible as anyone.
le dorfier
Mitch - the fundamental question I would have is, it seems to me that you are asserting that either a) Microsoft SQL Server statements and transactions do not comply, by default, with ACID requrements; or b) the process I describe above does not describe transactions for ACID purposes. Yes/No?
le dorfier
With this approach... assuming it avoids race condition between readers (which I'm assuming is in contention here)... requires that I have a cleanup process for orphaned items (readers that pull item and crash)....
DSO
With my approach when a reader fails the DB will release the transaction automatically and deleted item would return to queue.
DSO
The difference is that you are holding a transaction open for the entire (long) period of the work, which is in general not a good thing. And if it's something about the work item that caused the reader to fail, you're just going to fail again on the next reader.
le dorfier
le dorfier: I like your approach of using a State column and a single statement. But Mitch and you seem to disagree that it will work and that concerns me. Is the expected behavior formally documented somewhere? I don't think you can conclusively determine this via testing alone.
DSO
I come at the question from another direction than isolation level. I start from the fact that SQL Server asserts to support ACID transactions by default (see http://msdn.microsoft.com/en-us/library/ms190612(SQL.90).aspx).
le dorfier
From that description, and several years experience, (and also the belief that any SQL database would be fatally flawed if it didn't work that way), I think what I've described must work by default.
le dorfier
I once or twice have worked my way through the isolation level discussion, and worked out what it all means; but it's been quite a while. I find that working from the assumption that ACID applies by default to be sufficient not to mess with isolation levels.
le dorfier
And I've seen others bitten by overaggressive adjustment (in either direction) that they didn't understand thoroughly. (But that still doesn't mean I'm absolutely sure I know I'm right, and why.)
le dorfier
I'm not convinced... it seems the "Isolation" part of ACID you are quoting is simply a textbook definition. In practice, databases offer different degrees of isolation, because you would have poor concurrency if you strictly followed the textbook definition.
DSO
If you chose the Serializable isolation level in SQL server, I think that would meet the textbook definition of I in ACID... however most people would be willing to trade off strict isolation in favor of performance.
DSO
+1  A: 

If once you have removed the item from the 'Queue' mechanism, no prcess can access it, you don't need to keep the Transaction open while you actually process the item.

REPEATABLE READ isolation level:

Specifies that statements cannot read data that has been modified but not yet committed by other transactions and that no other transactions can modify data that has been read by the current transaction until the current transaction completes.

Shared locks are placed on all data read by each statement in the transaction and are held until the transaction completes. This prevents other transactions from modifying any rows that have been read by the current transaction. Other transactions can insert new rows that match the search conditions of statements issued by the current transaction. If the current transaction then retries the statement it will retrieve the new rows, which results in phantom reads. Because shared locks are held to the end of a transaction instead of being released at the end of each statement, concurrency is lower than the default READ COMMITTED isolation level. Use this option only when necessary.

Mitch Wheat
I'm holding the transaction open in case the reader fails for some reason, in which case I want the deletion to be undone so another reader can process it.
DSO
Why would Read Committed not work? That MSDN link you sent, it says with read committed, statements cannot read data that has been modified but not committed by other transactions. Doesn't this mean that when a reader does a delete in a transaction, no other readers can read it?
DSO
A: 

If you use a rowlevel lock and the readpast statement in your query, you can select just those rows that are not locked.

Looks like this.

SELECT queue FROM queue with (rowlock, xlock, readpast) WHERE state = 'N'

I suggest that you read up on locking hints in SQL server which will greatly assist you.

Middletone
Unfortunately, for a lot of people it seems to also confuse you.
le dorfier