views:

157

answers:

3

I'm inserting into a SQL database from multiple processes. It's likely that the processes will sometimes try to insert duplicate data into the table. I've tried to write the query in a way that will handle the duplicates but I still get:

System.Data.SqlClient.SqlException: Violation of UNIQUE KEY constraint 'UK1_MyTable'. Cannot insert duplicate key in object 'dbo.MyTable'.
The statement has been terminated.

My query looks something like:

INSERT INTO MyTable (FieldA, FieldB, FieldC)
SELECT FieldA='AValue', FieldB='BValue', FieldC='CValue'
WHERE (SELECT COUNT(*) FROM MyTable WHERE FieldA='AValue' AND FieldB='BValue' AND FieldC='CValue' ) = 0

The constraint 'UK1_MyConstraint' says that in MyTable, the combination of the 3 fields should be unique.

My questions:

  1. Why doesn't this work?
  2. What modification do I need to make so there is no chance of an exception due to the constraint violation?

Note that I'm aware that there are other approaches to solving the original problem of "INSERT if not exists" such as (in summary):

  • Using TRY CATCH
  • IF NOT EXIST INSERT (inside a transaction with serializable isolation)

Should I be using one of the approaches?

Edit 1 SQL for Creating Table:

CREATE TABLE [dbo].[MyTable](
  [Id] [bigint] IDENTITY(1,1) NOT NULL,
  [FieldA] [bigint] NOT NULL,
  [FieldB] [int] NOT NULL,
  [FieldC] [char](3) NULL,
  [FieldD] [float] NULL,
  CONSTRAINT [PK_MyTable] PRIMARY KEY NONCLUSTERED 
  (
    [Id] ASC
  )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON),
  CONSTRAINT [UK1_MyTable] UNIQUE NONCLUSTERED 
  (
    [FieldA] ASC,
    [FieldB] ASC,
    [FieldC] ASC
  )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
)

Edit 2 Decision:

Just to update this - I've decided to use the "JFDI" implementation suggested in the linked question (link). Although I'm still curious as to why the original implementation doesn't work.

+1  A: 

Off the top of my head, I have a feeling one or more of those columns accepts nulls. I would like to see the create statement for the table including the constraint.

Mike Cheel
Query for creating table added as requested.
Iain
@Adrift - It is perfectly normal for queries such as the OP's to fail under load. There is no guarantee that 2 concurrent transactions won't both do the `SELECT COUNT` bit at the same time (both will get `S` locks that don't conflict) then proceed to do an insert with the same values at which point the second one will fail.
Martin Smith
I can easily reproduce this without using nulls. I've setup a simple test console app that executes the query using hardcoded value. If I run 20 of these at the same time at-least 1 of them encounters the problem.
Iain
For FieldC (which accepts NULLs) are you changeing the = sign to IS when the assigning value is NULL?
Mike Cheel
@Mike, I'm not sure the assigning value will ever be null. In which case you should ask "why does it accept nulls?" - to which I reply - "don't know, will look into it". :-)
Iain
+2  A: 

RE: "I'm still curious as to why the original implementation doesn't work."

Why would it work?

What is there to prevent two concurrent transactions being interleaved as follows?

Tran A                                Tran B
---------------------------------------------
SELECT COUNT(*)...
                                  SELECT COUNT(*)...
INSERT ....
                                  INSERT... (duplicate key violation).

The only time conflicting locks will be taken is at the Insert stage.

To see this in SQL Profiler

Create Table Script

create table MyTable
(
FieldA int NOT NULL, 
FieldB int NOT NULL, 
FieldC int NOT NULL
)
create unique nonclustered index ix on  MyTable(FieldA, FieldB, FieldC)

Then paste the below into two different SSMS windows. Take a note of the spids of the connections (x and y) and set up a SQL Profiler Trace capturing locking events and user error messages. Apply filters of spid=x or y and severity = 0 and then execute both scripts.

Insert Script

DECLARE @FieldA INT, @FieldB INT, @FieldC INT
SET NOCOUNT ON
SET CONTEXT_INFO 0x696E736572742074657374

BEGIN TRY
WHILE 1=1
    BEGIN

        SET @FieldA=( (CAST(GETDATE() AS FLOAT) - FLOOR(CAST(GETDATE() AS FLOAT))) * 24 * 60 * 60 * 300)
        SET @FieldB = @FieldA
        SET @FieldC = @FieldA

        RAISERROR('beginning insert',0,1) WITH NOWAIT
        INSERT INTO MyTable (FieldA, FieldB, FieldC)
        SELECT FieldA=@FieldA, FieldB=@FieldB, FieldC=@FieldC
        WHERE (SELECT COUNT(*) FROM MyTable WHERE FieldA=@FieldA AND FieldB=@FieldB AND FieldC=@FieldC ) = 0
    END
END TRY
BEGIN CATCH
    DECLARE @message VARCHAR(500)
    SELECT @message = 'in catch block ' + ERROR_MESSAGE()
    RAISERROR(@message,0,1) WITH NOWAIT
    DECLARE @killspid VARCHAR(10) 
    SELECT @killspid = 'kill ' +CAST(SPID AS VARCHAR(4)) FROM sys.sysprocesses WHERE SPID!=@@SPID AND CONTEXT_INFO = (SELECT CONTEXT_INFO FROM sys.sysprocesses WHERE SPID=@@SPID)
    EXEC ( @killspid )
END CATCH
Martin Smith
This is exactly what I'm looking for - deeper understanding of what is actually going on. I did a test using UPDLOCK and HOLDLOCK (based on an answer to the linked question) on the Sub-Query and that resolved the problem also - but I don't understand why.
Iain
@Iain - to get an insight into this you can use SQL Profiler to trace the various locking events that happen during the execution of the statement.
Martin Smith
+4  A: 

Why doesn't this work?

I believe the default behaviour of SQL Server is to release shared locks as soon as they are no longer needed. Your sub-query will result in a short-lived shared (S) lock on the table, which will be released as soon as the sub-query completes.

At this point there is nothing to prevent a concurrent transaction from inserting the very row you just verified was not present.

What modification do I need to make so there is no chance of an exception due to the constraint violation?

Adding the HOLDLOCK hint to your sub-query will instruct SQL Server to hold on to the lock until the transaction is completed. (In your case, this is an implicit transaction.) The HOLDLOCK hint is equivalent to the SERIALIZABLE hint, which itself is equivalent to the serializable transaction isolation level which you refer in your list of "other approaches".

The HOLDLOCK hint alone would be sufficient to retain the S lock and prevent a concurrent transaction from inserting the row you are guarding against. However, you will likely find your unique key violation error replaced by deadlocks, occurring at the same frequency.

If you're retaining only an S lock on the table, consider a race between two concurrent attempts to insert the same row, proceeding in lockstep -- both succeed in acquiring an S lock on the table, but neither can succeed in acquiring the Exclusive (X) lock required to execute the insert.

Luckily there is another lock type for this exact scenario, called the Update (U) lock. The U lock is identical to an S lock with the following difference: whilst multiple S locks can be held simultaneously on the same resource, only one U lock may be held at a time. (Said another way, whilst S locks are compatible with each other (i.e. can coexist without conflict), U locks are not compatible with each other, but can coexist alongside S locks; and further along the spectrum, Exclusive (X) locks are not compatible with either S or U locks)

You can upgrade the implicit S lock on your sub-query to a U lock using the UPDLOCK hint.

Two concurrent attempts to insert the same row in the table will now be serialized at the initial select statement, since this acquires (and holds) a U lock, which is not compatible with another U lock from the concurrent insertion attempt.

NULL values

A separate problem may arise from the fact that FieldC allows NULL values.

If ANSI_NULLS is on (default) then the equality check FieldC=NULL would return false, even in the case where FieldC is NULL (you must use the IS NULL operator to check for null when ANSI_NULLS is on). Since FieldC is nullable, your duplicate check will not work when inserting a NULL value.

To correctly deal with nulls you will need to modify your EXISTS sub-query to use the IS NULL operator rather than = when a value of NULL is being inserted. (Or you can change the table to disallow NULLs in all the concerned columns.)

SQL Server Books Online References

Daniel Fortunov
+1 Nice answer.
Martin Smith
There's an answer to a similar question that also suggests ROWLOCK: "rowlock permits concurrent updates to existing rows."Is this true?http://stackoverflow.com/questions/639854/tsql-check-if-a-row-exists-otherwise-insert/2645436#2645436
Iain