views:

33

answers:

2

This is kind of a part 2 to this question: http://stackoverflow.com/questions/3319842/select-rows-and-update-same-rows-for-locking

I have a table in my database that hold millions of records. I have created a stored procedure to pull X amount of records and mark them as being locked so when my application calls the next set of X amount it will only pull the records that are not locked and so and on and so on… I am finding that with millions of records this is not very efficient. The queries are taking a while to run. Does anyone have some suggestions to increase my query efficiency while keeping the same idea for locking records? Below is the stored procedure:

Set Rowcount @topCount
 SELECT *
 into #temp_candidates
 FROM dbo.Candidates
 Where isTested = 0 and
       isLocked = 0 and 
       validated = 0 and
       validationId is null
 UPDATE dbo.Candidates
 SET islocked = 1,
 validationId = @validationId,
 validationDateTime = @validationDateTime
 WHERE id IN (SELECT id FROM #temp_candidates)
 Select *
 from dbo.Candidates
 where id in (SELECT id FROM #temp_candidates) IF EXISTS (SELECT NULL FROM
  tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'tempdb..#temp_candidates')) 
   BEGIN 
     DROP TABLE #temp_candidates
   END 
END
+1  A: 

I'm not sure I understand the need for the temp database. Why not just set isLocked = 1 for whatever records you wish (inside a transaction). Do whatever you want to them. Then set isLocked = 1 where isLocked = 0 and repeat.

Don Dickinson
+2  A: 

This kind of procedure are guaranteed to fail under concurrency. What you want to to achieve is to use a table as a queue, which is notoriously error prone, and there are well know tested and tried methods to use Use Tables as Queues. Right now you are on a path that leads nowhere fast. On your previous question you should had selected Cade's answer.

To dequeue items form the table (lock them for processing in the presence of concurrent threads trying to do the same) several conditions have to be met, all outlined in the article I linked, and the actual query is only one of them. The most important condition is to actually have a correct clustered index. The isLocked field must be the leftmost field in the clustered key, followed by the ordering criteria.

CREATE CLUSTERED INDEX cdxCandidates ON Candidates(isLocked, ...);

This clustered index is required for efficient dequeue operations. Then you must use the OUTPUT clause, is the only way to pull this out correctly:

WITH cte AS (
  SELECT TOP (@x) isLocked, ...
  FROM Candidates WITH (READPAST)
  WHERE isLocked = 0
  ORDER BY ...)
UPDATE cte
  SET isLocked = 1
  OUTPUT INSERTED.*;

The Query Processor understands that you're selecting for update and will acquire the rows in the table in a manner that prevent a concurrent thread from acquiring the same rows.

Incidentally this 'scheme' is exactly the scheme (the clustered index condition and the update with OUTPUT on top of a READPAST SELECT) is used by Service Broker's QUEUEs, which were designed specifically for high concurrent enqueue/dequeue operations. Other alternatives were considered this is the only one that has a decent chance of getting any performance while maintaining correctness

Remus Rusanu
Thank you so much for the insight!
DDiVita
Since I have the isLocked in the where clause I should index it? I really don't care about the order in which I retrieve data, jsut that the rows are not locked.
DDiVita
I already answered in my post: isLocked has to be the leftmost column in the clustered index
Remus Rusanu