views:

177

answers:

4

In my answer to this SO question I suggest using a single insert statement, with a select that increments a value, as shown below.

Insert Into VersionTable 
(Id, VersionNumber, Title, Description, ...) 
Select @ObjectId, max(VersionNumber) + 1, @Title, @Description 
From VersionTable 
Where Id = @ObjectId 

I suggested this because I believe that this statement is safe in terms of concurrency, in that if another insert for the same object id is run at the same time, there is no change of having duplicate version numbers.

Am I correct?

+1  A: 

I think you're assumption is incorrect. When you query the VersionNumber table, you are only putting a read lock on the row. This does not prevent other users from reading the same row from the same table. Therefore, it is possible for two processes to read the same row in the VersionNumber table at the same time and generate the same VersionNumber value.

Randy Minder
Sorry Randy, I had a typo in my sql. The insert and the select are on the same table. Does that change your answer?
David Hall
@David - My answer doesn't change. The row being read from still will only have a read lock on it, not a write lock. Therefore, it is still possible two process could read the same row at the same time, and generate the same ID value. To make this safe, I believe you need to set the isolation level to Serializable (or Repeatable Read), to prevent this from happening.
Randy Minder
+3  A: 

The default isolation of read commited makes this unsafe, if two of these run in perfect paralel you will get a duplicate since there is no read lock applied.

You need REPEATABLE READ or SERIALIZABLE isolation levels to make it safe.

Paul Creasey
The insert can't run in parallell with the select, because it uses data from the select
Andomar
I mean two users concurrently executing the query.
Paul Creasey
`SERIALIZABLE` works, but `REPEATABLE READ` is definitely not enough (still causes duplicates in my test code).
Heinzi
A: 
  • You need a unique constraint on (Id, VersionNumber) to enforce it

  • I'd use ROWLOCK, XLOCK hints to block other folk reading the locked row where you calculate

  • or wrap the INSERT in a TRY/CATCH. If I get a duplicate, try again...

gbn
+8  A: 

As Paul writes: No, it's not safe, for which I would like to add empirical evidence: Create a table Table_1 with one field ID and one record with value 0. Then execute the following code simultaneously in two Management Studio query windows:

declare @counter int
set @counter = 0
while @counter < 1000
begin
  set @counter = @counter + 1

  INSERT INTO Table_1
    SELECT MAX(ID) + 1 FROM Table_1 

end

Then execute

SELECT ID, COUNT(*) FROM Table_1 GROUP BY ID HAVING COUNT(*) > 1

On my SQL Server 2008, one ID (662) was created twice. Thus, the default isolation level applied to single statements is not sufficient.


EDIT: Clearly, wrapping the INSERT with BEGIN TRANSACTION and COMMIT won't fix it, since the default isolation level for transactions is still READ COMMITTED, which is not sufficient. Note that setting the transaction isolation level to REPEATABLE READ is also not sufficient. The only way to make the above code safe is to add

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

at the top. This, however, caused deadlocks every now and then in my tests.

EDIT: The only solution I found which is safe and does not produce deadlocks (at least in my tests) is to explicitly lock the table exclusively (default transaction isolation level is sufficient here). Beware though; this solution might kill performance:

...loop stuff...
    BEGIN TRANSACTION

    SELECT * FROM Table_1 WITH (TABLOCKX, HOLDLOCK) WHERE 1=0

    INSERT INTO Table_1
      SELECT MAX(ID) + 1 FROM Table_1 

    COMMIT
...loop end...
Heinzi
Thanks for this Heinzi, it looks quite definitive - I've been trying to think of a way of testing this myself, hadn't thought of using a loop.
David Hall
Oh - and there is +1 on the way from me tomorrow, I've run out of upvotes for today.
David Hall
I do not understand why this loop would produce duplicates, it certainly should not imo. I'm going to test this!
Paul Creasey
@Paul: Why shouldn't it?
Heinzi
There is no parallelism or concurrency there, just a thousand sequential transactions occuring synchronously.
Paul Creasey
Why no concurrency? Both Management Studio query windows are executed in parallel...
Heinzi
oh, I totally missed the two windows part lol, major fail! Apologies!
Paul Creasey
No problems; thanks for the feedback. I've emphasized it now to avoid misunderstanding in the future.
Heinzi
Ah yes, REPEATABLE READ will indeed not work since it only acquires a write lock on the relevant row, not a read lock. I'm surprised by the deadlocks though.
Paul Creasey
Me too. I thought deadlocks only occur when locks are placed by parallel executions *in different order*. I wonder how this could happen here.
Heinzi
Under SERIALIZABLE model a deadlocks occurs like this: T1 selects MAX(...) and as a result places an range-S lock from BOF to EOF (ie. the entire table including the virtual start and end). T2 selects MAX(...) and also places the same range-S lock from BOF to EOF. T1 attempts to insert, blocks behind T2 range lock. T2 attempts to insert, blocks behind T1's range lock. Deadlock.
Remus Rusanu
@Remus, thanks for that, makes sense.
Paul Creasey
Btw, if there is an index on version the range lock will only lock the last version number and the virtual end of range, but that is enough because the insert will conflict with this range and thus the deadlock still occurs.
Remus Rusanu
@Remus, thanks from me as well. So I guess an UPDLOCK or XLOCK hint on the SELECT might solve the deadlock problem...
Heinzi