views:

203

answers:

7

Hello,

I had always used something similar to the following to achieve it:

INSERT INTO TheTable
SELECT
    @primaryKey,
    @value1,
    @value2
WHERE
    NOT EXISTS
    (SELECT
        NULL
    FROM
        TheTable
    WHERE
        PrimaryKey = @primaryKey)

...but once under load, a primary key violation occurred. This is the only statement which inserts into this table at all. So does this mean that the above statement is not atomic?

The problem is that this is almost impossible to recreate at will.

Perhaps I could change it to the something like the following:

INSERT INTO TheTable
WITH
    (HOLDLOCK,
    UPDLOCK,
    ROWLOCK)
SELECT
    @primaryKey,
    @value1,
    @value2
WHERE
    NOT EXISTS
    (SELECT
        NULL
    FROM
        TheTable
    WITH
        (HOLDLOCK,
        UPDLOCK,
        ROWLOCK)
    WHERE
        PrimaryKey = @primaryKey)

Although, maybe I'm using the wrong locks or using too much locking or something.

I have seen other questions on stackoverflow.com where answers are suggesting a "IF (SELECT COUNT(*) ... INSERT" etc., but I was always under the (perhaps incorrect) assumption that a single SQL statement would be atomic.

Does anyone have any ideas?

Thanks.

Adam

A: 

I don't know if this is the "official" way, but you could try the INSERT, and fall back to UPDATE if it fails.

Marcelo Cantos
A: 

I would go with the solution you proposed

Vidar Nordnes
+9  A: 

I added HOLDLOCK which wasn't present originally. Please disregard the version without this hint.

As far as I'm concerned, this should be enough:

INSERT INTO TheTable 
SELECT 
    @primaryKey, 
    @value1, 
    @value2 
WHERE 
    NOT EXISTS 
    (SELECT 0
     FROM TheTable WITH (UPDLOCK, HOLDLOCK)
     WHERE PrimaryKey = @primaryKey) 

Also, if you actually want to update a row if it exists and insert if it doesn't, you might find this question useful.

GSerg
What are you locking when the row doesn't exist?
Martin Smith
A relevant range in the index (the primary key in this case).
GSerg
@GSerg Agreed. The pessimistic/optimistic locking of the select statement needs a directive.
DaveWilliamson
@Gserg - That makes sense.
Martin Smith
Thanks. This makes sense to me too. Although when I originally wrote the statement I naively assumed that something magical would happen inside the server!
Adam
I've tried playing around with Profiler tracing the lock events for both the original version and the version with the hint. http://img405.imageshack.us/img405/1623/locksx.png The only difference I can see is that one locks the page with IS mode and the other in IU mode so I'm not sure this will make any difference? (As IS mode is meant to be incompatible with another IS mode lock on the same object anyway according to http://msdn.microsoft.com/en-us/library/ms186396.aspx)
Martin Smith
Two IS's cause no conflict. Can you replace the IDs with object names?
GSerg
@GSerg - Ah sorry I thought `N` was Not Compatible. Regardless it is the same compatibility on the matrix as for 2 `IU` s anyway and the `UPDLOCK` don't seem to take out any additional range locks. Profiler doesn't show these object names but the short one is the table name and the long one is found in both `master.sys.allocation_units` and `sys.partitions`
Martin Smith
The point is, two `U`'s are not compatible. In your screenshot there are no `U`'s, which is probably because you table doesn't have an index on the `id` column. The updlock places a `KEY` U-lock on that index.
GSerg
@GSerg - It is the PK in my test (With clustered index). Do you see something different when you look at the locks your end then? And are you testing the case that the row doesn't already exist?
Martin Smith
I'm testing both cases, and there's always a `U` lock. Can you do the same little test on your side (see my edited answer)?
GSerg
The issue with your test (I think) is it is not the same as the OP's original situation. The OP only has one statement and the pattern of lock release seems to be the same regardless of hint for that. Without the hint the locks are released at the end of the statement and before the delay with the hint they are released at the end of the whole transaction. I definitely only see `U` locks when I am running it and the `where id=4` bit matches a record. What are they on? a range?
Martin Smith
I believe the trick is that `IU` is not compatible with `U`. The first connection will place an `IU` on the index page because the row does not exist yet, the second will have to place a `U` on the row, which will result in waiting.
GSerg
I don't think there's anything stopping 2 concurrent transactions doing `SELECT 0 FROM Table1 with (updlock) WHERE PrimaryKey = x` *as long as `x` doesn't exist* (I just tested that in isolation and the 2 transactions didn't block each other) but I've bugged you enough about this now. Sorry!
Martin Smith
No, keep bugging, I sometimes find myself *feeling* what is correct instead of *knowing* it. This is the case. Side note: SQL Server has an interesting effect where two transactions will not block each other even if each of them tries to get an exclusive lock on the same object, *provided that no actual update occurs* (a trick to improve concurrency). We should have a nice stress testing tomorrow.
GSerg
The testing shows that two `select 'Done.' where exists(select 0 from foo_testing with(updlock) where id = 4);`, *provided `id=4` does not exist*, don't conflict with each other, which means my original answer was actually wrong. The solution is to add the `HOLDLOCK` hint. See the edited answer. Thanks for keeping me bugged :)
GSerg
There's a great explanation of why this locking is required in Daniel's answer to my (very similar) question: http://stackoverflow.com/questions/3789287/violation-of-unique-key-constraint-on-insert-where-count-0-on-sql-server-200/3791506#3791506
Iain
A: 

I've done a similiar operation in past using a different method. First, I declare a variaable to hold the primary key. Then I populate that variable with the output of a select statement which look sfor a record with those values. Then I do and IF statement. If primary key is null, then do insert, else, return some error code.

     DECLARE @existing varchar(10)
    SET @existing = (SELECT primaryKey FROM TABLE WHERE param1field = @param1 AND param2field = @param2)

    IF @existing is not null
    BEGIN
    INSERT INTO Table(param1Field, param2Field) VALUES(param1, param2)
    END
    ELSE
    Return 0
END
Marc
Why not just do:IF NOT EXISTS (SELECT * FROM TABLE WHERE param1field = @param1 AND param2field = @param2)BEGIN INSERT INTO Table(param1Field, param2Field) VALUES(param1, param2)END
Vidar Nordnes
Yeah, but that looks like it's open to concurrency issues (i.e. what if something happens on another connection between your select and your insert?)
Adam
@Adam Marc's code above isn't any better for avoiding locking issues. The only two ways to handle concurrency issues are to lock using WITH (UPDLOCK, HOLDLOCK) or to handle the insert error and convert it to an update.
Emtucifor
+5  A: 

What about the "JFDI" pattern?

BEGIN TRY
   INSERT etc
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
END CATCH

Seriously, this is quickest and the most concurrent without locks, especially at high volumes. What if the UPDLOCK is escalated and the whole table is locked?

Read lesson 4

gbn
+1 for the scalability of this solution, though one should be cautious of using this pattern in the general case outside the poster's specific situation.
Dave Markle
Thanks - okay, I agree that this is probably what I will end up using, and is the answer to the actual question.
Adam
I know it's bad to rely on errors like this, but I wonder if doing this with just a straight `INSERT` (without the `EXISTS`) would perform better (i.e. try insert no matter what and just ignore error 2627).
Adam
That depends on whether you mostly insert values that don't exist or mostly values that *do* exist. In the latter case, I'd argue the performance will be poorer due to tons of exceptions being raised and ignored.
GSerg
@Gserg: correct. But then OP would have posted an INSERT/UPDATE question, not test for inert arguably. We use this to filter out a few thousand duplicates in dozen million new rows per day
gbn
+4  A: 

You could use MERGE:

MERGE INTO Target
USING (VALUES (@primaryKey, @value1, @value2)) Source (key, value1, value2)
ON Target.key = Source.key
WHEN MATCHED THEN
    UPDATE SET value1 = Source.value1, value2 = Source.value2
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Name, ReasonType) VALUES (@primaryKey, @value1, @value2)
Chris Smith
+2  A: 

The simplest solution, in my opinion, is to simply use the statement

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

before executing your statement. This requires no locking hints to achieve your desired output, which are a bit of a "code smell", and can become tough to manage if queries get sufficiently complex.

You were close when you asked if the operation is atomic. It is, but your problem is that it's not sufficiently isolated from other statements that run.

You might ask yourself why this occurs. The answer is scalability. Operations that aren't as isolated don't require the level of locks as those that are more isolated. By being "looser" with the locks, you cut the lock contention on the system and the system becomes more scalable.

When you increase your transaction isolation level to REPEATABLE READ, the database server automatically puts the proper locks on objects to ensure that no other transaction can change what your transaction reads while it's reading it. By putting the UPDLOCK hint on the table, you have essentially manually increased the isolation level of your transaction.

I hope this clears up a bit of conceptual stuff. Oh, and one more thing. If you're using connection pooling, make sure to set the transaction isolation level back to READ COMMITTED (which is usually the default) at the end of your statement, or future statements executed against that pooled connection may also run in the higher isolation level, which could cause your system to scale poorly.

(also see gbn's answer, which has the benefit of using less locks, but for your specific situation should be adequate)

Dave Markle
Thanks - I take your point about locking hints being smelly. I'll definately just use your suggestion for the general case, but I think I'm gonna go with gbn's more direct approach in this particular case.
Adam