views:

72

answers:

4

I need to have a MsSql database table and another 8 (identical) processes accessing the same table in parallel - making a select top n, processing those n rows, and updating a column of those rows. The problem is that I need to select and process each row just once. This means that if one process got to the database and selected the top n rows, when the second process comes it should find those rows locked and select the rows from n to 2*n rows, and so on...

Is it possible to put a lock on some rows when you select them, and when someone requests top n rows which are locked to return the next rows, and not to wait for the locked ones? Seems like a long shot, but...

Another thing I was thinking - maybe not so elegant but sounds simple and safe, is to have in the database a counter for the instances which made selects on that table. The first instance that comes will increment the counter and select top n, the next one will increment the counter and select rows from n*(i-1) to n*i, and so on...

Does this sound like a good ideea? Do you have any better suggestions? Any thought is highly appreciated!

Thanks for your time.

+2  A: 

The most simplest method is to use row locking:

BEGIN TRAN

SELECT *
FROM authors
WITH (HOLDLOCK, ROWLOCK)
WHERE au_id = '274-80-9391'

/* Do all your stuff here while the record is locked */

COMMIT TRAN

But if you are accessing your data and then closing the connection, you won't be able to use this method.

How long will you be needing to lock the rows for? The best way might actually be as you say to place a counter on the rows you select (best done using OUTPUT clause within an UPDATE).

Codesleuth
I need to close the connection after the select, and processing the batch of rows will take some time (each row will determine sending an e-mail with a 500kb attachment).
Diana
I'd say you need to implement your own method of locking the rows (such as the counter) as row locking can block requests long enough that you will have timeout errors appearing elsewhere. Martin's comment looks like a way forward http://stackoverflow.com/questions/3636950/return-unlocked-rows-in-a-select-top-n-query#comment-3822218
Codesleuth
A: 

The best idea if you want to select records in this manner would be to use a counter in a separate table.

You really don't want to be locking rows on a production database exclusively for any great period of time, therefore I would recommend using a counter. This way only one of your processes would be able to grab that counter number at a time (as it will lock as it is being updated) which will give you the concurrency that you need.

If you need a hand writing the tables and procedures that will do this (simply and safely as you put it!) just ask.

JonVD
Ok, glad to see that the idea of the counter sounds good enough and no major flows can be seen on the first sight. I'll give it more thought for a day or so. Thanks for your promp response and help!
Diana
+2  A: 

Here's a sample I blogged about a while ago:

The READPAST hint is what ensures multiple processes don't block each other when polling for records to process. Plus, in this example I have a bit field to physically "lock" a record - could be a datetime if needed.

DECLARE @NextId INTEGER
BEGIN TRANSACTION

-- Find next available item available
SELECT TOP 1 @NextId = ID
FROM QueueTable WITH (UPDLOCK, READPAST)
WHERE IsBeingProcessed = 0
ORDER BY ID ASC

-- If found, flag it to prevent being picked up again
IF (@NextId IS NOT NULL)
    BEGIN
        UPDATE QueueTable
        SET IsBeingProcessed = 1
        WHERE ID = @NextId
    END

COMMIT TRANSACTION

-- Now return the queue item, if we have one
IF (@NextId IS NOT NULL)
    SELECT * FROM QueueTable WHERE ID = @NextId
AdaTheDev
Thanks for your anser, READPAST sounds like the MsSql answer I was looking for. If that SELECT TOP 1 @NextId = ID is a select top 3000 over several tables of million of records, it might take some time. Is it "guaranteed" by the READPAST that if I have another thread making the exact same select, the second one will pick up the next unblocked rows?
Diana
I tested your approach on a simple test table, works as a charm. But on my real table (preatty big, lots of indexes and statistics on it), it only works sometimes... Other times, doesn't skip the locked rows, it just "hangs" until the previous select locking the rows is over (To test this I added a w"aitfor delay '00:00:10'"). Any ideea why this is happening?
Diana
@Diana - How many rows are you trying to lock per process? e.g. perhaps a table lock is being taken (rowlocks will get upgraded to table locks over a certain threshold). If you could post up an example script that demonstrates how you are doing it (as obv. my example was for just a single record), that would be great - might be worth in a linked questions for maximum exposure
AdaTheDev
@AdaTheDev - I thought the "top 1500" or the "order by" from the select might be the problem, so I changed it to a "top 1" and removed the "order by". I also suspect the entire table is being locked. If not for the top 1500 or order by, maybe for some clustered indexes and statistics that are on that table...? I'll put another question on that. Thanks!
Diana
I posted another question concerning this problem: http://stackoverflow.com/questions/3683593/select-top-1-with-updlock-and-readpast-sets-exclusive-lock-on-the-entire-table
Diana
A: 

EDIT: ahh, nevermind, you're working in a disconnected style. How about this:

UPDATE TOP (@n) QueueTable SET Locked = 1
OUTPUT INSERTED.Col1, INSERTED.Col2 INTO @this
WHERE Locked = 0

<do your stuff>

Perhaps you are looking for the READPAST hint?

<begin or save transaction>

INSERT INTO @this (Col1, Col2) 
SELECT TOP (@n) Col1, Col2 
FROM Table1 WITH (ROWLOCK, HOLDLOCK, READPAST)

<do your stuff>

<commit or rollback>
Peter