ansaurus

Question

Atomic UPSERT in SQL Server 2005

Answer 1

+1 A:

One trick I've seen is to try the INSERT and, if it fails, perform the UPDATE.

Marcelo Cantos 2010-03-26 11:57:58

Conditional atomic inserts are generally faster than a TRY CATCH block. See here for some benchmarking: http://stackoverflow.com/questions/1688618/sql-insert-but-avoid-duplicates/1689104#1689104

Peter 2010-03-27 01:04:51

Answer 2

A:

You could use application locks: (sp_getapplock) http://msdn.microsoft.com/en-us/library/ms189823.aspx

thijs 2010-03-26 12:45:42

Answer 3

+5 A:

INSERT INTO <table>
SELECT <natural keys>, <other stuff...>
FROM <table>
WHERE NOT EXISTS
   -- race condition risk here?
   ( SELECT 1 FROM <table> WHERE <natural keys> )

UPDATE ...
WHERE <natural keys>

there is a race condition in the first INSERT. The key may not exists during the inner query SELECT, but does exist at INSERT time resulting in key violation.
there is a race condition between the INSERT and UPDATE. The key may exist when checked in the inner query of the INSERT but is gone by the time UPDATE runs.

For the second race condition one could argue that the key would have been deleted anyway by the concurrent thread, so it's not really a lost update.

The optimal solution is usually to try the most likely case, and handle the error if it fails (inside a transaction, of course):

if the key is likely missing, always insert first. Handle the unique constraint violation, fallback to update.
if the key is likely present, always update first. Insert if no row was found. Handle possible unique constraint violation, fallback to update.

Besides correctness, this pattern is also optimal for speed: is more efficient to try to insert and handle the exception than to do spurious lockups. Lockups mean logical page reads (which may mean physical page reads), and IO (even logical) is more expensive than SEH.

Update @Peter

Why isn't a single statement 'atomic'? Let's say we have a trivial table:

create table Test (id int primary key);

Now if I'd run this single statement from two threads, in a loop, it would be 'atomic', as you say, an no race condition can exist:

  insert into Test (id)
    select top (1) id
    from Numbers n
    where not exists (select id from Test where id = n.id);

Yet in only a couple of seconds, a primary key violation occurs:

Msg 2627, Level 14, State 1, Line 4
Violation of PRIMARY KEY constraint 'PK__Test__24927208'. Cannot insert duplicate key in object 'dbo.Test'.

Why is that? You are correct in that SQL query plan will do the 'right thing' on DELETE ... FROM ... JOIN, on WITH cte AS (SELECT...FROM ) DELETE FROM cte and in many other cases. But there is a crucial difference in these cases: the 'subquery' refers to the target of an update or delete operation. For such cases the query plan will indeed use an appropriate lock, in fact I this behavior is critical on certain cases, like when implementing queues Using tables as Queues.

But in the original question, as well as in my example, the subquery is seen by the query optimizer just as a subquery in a query, not as some special 'scan for update' type query that needs special lock protection. The result is that the execution of the subquery lookup can be observed as a distinct operation by a concurent observerver, thus breaking the 'atomic' behavior of the statement. Unless special precaution is taken, multiple threads can attempt to insert the same value, both convinced they had checked and the value doesn't already exists. Only one can succeed, the other will hit the PK violation. QED.

Remus Rusanu 2010-03-26 17:29:25

This is wrong. The conditional insert w/ where clause is atomic. There is no race condition.

Peter 2010-03-27 00:31:52

@Peter: just because is inside an auto-transaction doesn't make it atomic.

Remus Rusanu 2010-03-27 01:33:19

If you provide a citation I might be convinced, but it doesn't make sense of the face of it. What guarantees atomicity in the multi-statement explicit transaction? The transaction. How is that transaction different than a single-statement transaction, implicit or otherwise? Is the INSERT ... WHERE unwrapped into two transactions? No. So either it is atomic, or transactions do not guarantee atomicity. If transactions do not guarantee atomicity, then why bother wrapping your UPDATE ... INSERT in a transaction at all?

Peter 2010-03-28 03:11:58

The subquery scans and crabs S-locks which are releases asap. There isn't anythig to quard from another thread deleting the row this thread just read. This is why the UPDLOCK in the accepted answer is important.

Remus Rusanu 2010-03-28 05:12:38

@Remus, the correlated subquery executes in the context of the INSERT statement, and SQL Server will raise whatever locks are necessary to ensure atomicity. If this is not the case, please provide links to documentation or testing that bears this out. As a related question, if `INSERT ... WHERE NOT EXISTS ()` is not atomic, is `MERGE` atomic? How about `UPDATE ... JOIN`? `DELETE .... JOIN`? `DELETE ... WHERE`?

Peter 2010-03-28 16:31:30

@Peter: See my update

Remus Rusanu 2010-03-29 01:30:09

Shoot, you're right. I'm glad to have been corrected on an important issue like this, but now I have much code to fix, and I have yet to work out an optimal set of table hints/query constructs for conditional inserts. `HOLOCK, TABLOCKX` seems to work, but gawd. Thank you Remus!

Peter 2010-03-30 01:54:59

fyi, in a trivial script like the one above, `INSERT INTO Test WITH (TABLOCKX, HOLDLOCK)` was not just concurrency-friendly, but also showed 5x the insert volume of `INSERT INTO Test` with no locking hints. I am surprised for now.

Peter 2010-03-30 02:44:01

TABLOCKX is overkill, as is so broad it limits scalability. HOLDLOCK will not help as it applies to S-locks only and will *not* prevent PK violation. The appropriate hint is UPDLOCK. For high concurrency the ROWLOCK hint may also help.

Remus Rusanu 2010-03-30 06:18:40

I'll try UPDLOCK again, but on the target table of a conditional insert w/ correlated subquery, it did not prevent PK violations. `TABLOCKX, HOLDLOCK` did, or it did so long as the scripts ran. I hope to work out some combination of hints that would allow a conditional insert to work as I had imagined it worked. In reviewing the locks, I'm not sure one can get away with less than a TABLOCK, as you are comparing the new value to an entire domain. But, we shall see.

Peter 2010-03-30 06:44:26

Answer 4

+2 A:

I would put your select approach and solve that problem with a transaction

BEGIN TRANSACTION
IF (SELECT COUNT(*) FROM MyTable WITH(UPDLOCK) WHERE <condition>) = 0
  -- race condition risk here
  INSERT ...
ELSE
  UPDATE ...
COMMIT TRANSACTION

I use a UPDLOCK hint in this example. http://msdn.microsoft.com/en-us/library/ms187373.aspx

The transaction is short enough so it will not block your application.

Arthur 2010-03-26 17:58:06

unless you make this whole transaction serilizable with SET TRANSACTION ISOLATION LEVEL SERIALIZABLE you're not solving anything.

Mladen Prajdic 2010-03-28 16:29:42

Answer 5

+2 A:

EDIT: Remus is correct, the conditional insert w/ where clause does not guarantee a consistent state between the correlated subquery and table insert.

Perhaps the right table hints could force a consistent state. INSERT <table> WITH (TABLOCKX, HOLDLOCK) seems to work, but I have no idea if that is the optimal level of locking for a conditional insert.

In a trivial test like the one Remus described, TABLOCKX, HOLDLOCK showed ~5x the insert volume of no table hints, and without the PK errors or course.

ORIGINAL ANSWER, INCORRECT:

Is this atomic?

Yes, the conditional insert w/ where clause is atomic, and your INSERT ... WHERE NOT EXISTS() ... UPDATE form is the proper way to perform an UPSERT.

I would add IF @@ROWCOUNT = 0 between the INSERT and UPDATE:

INSERT INTO <table>
SELECT <natural keys>, <other stuff...>
WHERE NOT EXISTS
   -- no race condition here
   ( SELECT 1 FROM <table> WHERE <natural keys> )

IF @@ROWCOUNT = 0 BEGIN
  UPDATE ...
  WHERE <natural keys>
END

Single statements always execute within a transaction, either their own (autocommit and implicit) or together with other statements (explicit).

Peter 2010-03-27 01:01:09

Thanks for those resources! As Remus points out, that doesn't guarantee it's atomic, though, so I'm going to have to go with Arthur's explicit lock approach even though it's uglier IMO :(

rabidpebble 2010-03-27 13:24:33

@rabidpebble: as far as I know, Remus is wrong. Statements are *always* executed within a transaction, and the transaction guarantees atomicity. If it didn't, why would we ever bother w/ multi-statement explicit transactions?

Peter 2010-03-28 03:26:56

Answer 6

+1 A:

Pass updlock, rowlock, holdlock hints when testing for existence of the row. Holdlock ensures that all inserts are serialised; rowlock permits concurrent updates to existing rows.

Updates may still block if your PK is a bigint, as the internal hashing is degenerate for 64-bit values.

begin tran -- default read committed isolation level is fine

if not exists (select * from <table> with (updlock, rowlock, holdlock) where <PK = ...>
    -- insert
else
    -- update

commit

Cassius Porcus 2010-04-15 13:03:11

ansaurus

tags:

views:

answers:

Atomic UPSERT in SQL Server 2005

related questions