ansaurus

Question

Only inserting a row if it's not already there

Answer 1

A:

I don't know if this is the "official" way, but you could try the INSERT, and fall back to UPDATE if it fails.

Marcelo Cantos 2010-08-04 16:59:46

Answer 2

A:

I would go with the solution you proposed

Vidar Nordnes 2010-08-04 17:01:01

Answer 3

+9 A:

I added HOLDLOCK which wasn't present originally. Please disregard the version without this hint.

As far as I'm concerned, this should be enough:

INSERT INTO TheTable 
SELECT 
    @primaryKey, 
    @value1, 
    @value2 
WHERE 
    NOT EXISTS 
    (SELECT 0
     FROM TheTable WITH (UPDLOCK, HOLDLOCK)
     WHERE PrimaryKey = @primaryKey)

Also, if you actually want to update a row if it exists and insert if it doesn't, you might find this question useful.

GSerg 2010-08-04 17:01:03

What are you locking when the row doesn't exist?

Martin Smith 2010-08-04 17:04:04

A relevant range in the index (the primary key in this case).

GSerg 2010-08-04 17:05:29

@GSerg Agreed. The pessimistic/optimistic locking of the select statement needs a directive.

DaveWilliamson 2010-08-04 17:06:07

@Gserg - That makes sense.

Martin Smith 2010-08-04 17:10:56

Thanks. This makes sense to me too. Although when I originally wrote the statement I naively assumed that something magical would happen inside the server!

Adam 2010-08-04 17:18:18

I've tried playing around with Profiler tracing the lock events for both the original version and the version with the hint. http://img405.imageshack.us/img405/1623/locksx.png The only difference I can see is that one locks the page with IS mode and the other in IU mode so I'm not sure this will make any difference? (As IS mode is meant to be incompatible with another IS mode lock on the same object anyway according to http://msdn.microsoft.com/en-us/library/ms186396.aspx)

Martin Smith 2010-08-04 19:48:20

Two IS's cause no conflict. Can you replace the IDs with object names?

GSerg 2010-08-04 20:06:46

@GSerg - Ah sorry I thought `N` was Not Compatible. Regardless it is the same compatibility on the matrix as for 2 `IU` s anyway and the `UPDLOCK` don't seem to take out any additional range locks. Profiler doesn't show these object names but the short one is the table name and the long one is found in both `master.sys.allocation_units` and `sys.partitions`

Martin Smith 2010-08-04 20:30:43

The point is, two `U`'s are not compatible. In your screenshot there are no `U`'s, which is probably because you table doesn't have an index on the `id` column. The updlock places a `KEY` U-lock on that index.

GSerg 2010-08-04 20:58:45

@GSerg - It is the PK in my test (With clustered index). Do you see something different when you look at the locks your end then? And are you testing the case that the row doesn't already exist?

Martin Smith 2010-08-04 21:01:49

I'm testing both cases, and there's always a `U` lock. Can you do the same little test on your side (see my edited answer)?

GSerg 2010-08-04 21:08:35

The issue with your test (I think) is it is not the same as the OP's original situation. The OP only has one statement and the pattern of lock release seems to be the same regardless of hint for that. Without the hint the locks are released at the end of the statement and before the delay with the hint they are released at the end of the whole transaction. I definitely only see `U` locks when I am running it and the `where id=4` bit matches a record. What are they on? a range?

Martin Smith 2010-08-04 21:30:25

I believe the trick is that `IU` is not compatible with `U`. The first connection will place an `IU` on the index page because the row does not exist yet, the second will have to place a `U` on the row, which will result in waiting.

GSerg 2010-08-04 22:05:13

I don't think there's anything stopping 2 concurrent transactions doing `SELECT 0 FROM Table1 with (updlock) WHERE PrimaryKey = x` *as long as `x` doesn't exist* (I just tested that in isolation and the 2 transactions didn't block each other) but I've bugged you enough about this now. Sorry!

Martin Smith 2010-08-04 22:51:46

No, keep bugging, I sometimes find myself *feeling* what is correct instead of *knowing* it. This is the case. Side note: SQL Server has an interesting effect where two transactions will not block each other even if each of them tries to get an exclusive lock on the same object, *provided that no actual update occurs* (a trick to improve concurrency). We should have a nice stress testing tomorrow.

GSerg 2010-08-04 23:16:06

The testing shows that two `select 'Done.' where exists(select 0 from foo_testing with(updlock) where id = 4);`, *provided `id=4` does not exist*, don't conflict with each other, which means my original answer was actually wrong. The solution is to add the `HOLDLOCK` hint. See the edited answer. Thanks for keeping me bugged :)

GSerg 2010-08-05 12:01:25

There's a great explanation of why this locking is required in Daniel's answer to my (very similar) question: http://stackoverflow.com/questions/3789287/violation-of-unique-key-constraint-on-insert-where-count-0-on-sql-server-200/3791506#3791506

Iain 2010-09-25 08:35:34

Answer 4

A:

I've done a similiar operation in past using a different method. First, I declare a variaable to hold the primary key. Then I populate that variable with the output of a select statement which look sfor a record with those values. Then I do and IF statement. If primary key is null, then do insert, else, return some error code.

     DECLARE @existing varchar(10)
    SET @existing = (SELECT primaryKey FROM TABLE WHERE param1field = @param1 AND param2field = @param2)

    IF @existing is not null
    BEGIN
    INSERT INTO Table(param1Field, param2Field) VALUES(param1, param2)
    END
    ELSE
    Return 0
END

Marc 2010-08-04 17:06:14

Why not just do:IF NOT EXISTS (SELECT * FROM TABLE WHERE param1field = @param1 AND param2field = @param2)BEGIN INSERT INTO Table(param1Field, param2Field) VALUES(param1, param2)END

Vidar Nordnes 2010-08-04 17:29:21

Yeah, but that looks like it's open to concurrency issues (i.e. what if something happens on another connection between your select and your insert?)

Adam 2010-08-04 17:31:36

@Adam Marc's code above isn't any better for avoiding locking issues. The only two ways to handle concurrency issues are to lock using WITH (UPDLOCK, HOLDLOCK) or to handle the insert error and convert it to an update.

Emtucifor 2010-10-02 01:06:07

Answer 5

+5 A:

What about the "JFDI" pattern?

BEGIN TRY
   INSERT etc
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
END CATCH

Seriously, this is quickest and the most concurrent without locks, especially at high volumes. What if the UPDLOCK is escalated and the whole table is locked?

Read lesson 4

gbn 2010-08-04 17:42:31

+1 for the scalability of this solution, though one should be cautious of using this pattern in the general case outside the poster's specific situation.

Dave Markle 2010-08-04 18:20:47

Thanks - okay, I agree that this is probably what I will end up using, and is the answer to the actual question.

Adam 2010-08-04 19:58:02

I know it's bad to rely on errors like this, but I wonder if doing this with just a straight `INSERT` (without the `EXISTS`) would perform better (i.e. try insert no matter what and just ignore error 2627).

Adam 2010-08-04 20:02:40

That depends on whether you mostly insert values that don't exist or mostly values that *do* exist. In the latter case, I'd argue the performance will be poorer due to tons of exceptions being raised and ignored.

GSerg 2010-08-04 21:11:38

@Gserg: correct. But then OP would have posted an INSERT/UPDATE question, not test for inert arguably. We use this to filter out a few thousand duplicates in dozen million new rows per day

gbn 2010-08-05 04:19:03

Answer 6

+4 A:

You could use MERGE:

MERGE INTO Target
USING (VALUES (@primaryKey, @value1, @value2)) Source (key, value1, value2)
ON Target.key = Source.key
WHEN MATCHED THEN
    UPDATE SET value1 = Source.value1, value2 = Source.value2
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Name, ReasonType) VALUES (@primaryKey, @value1, @value2)

Chris Smith 2010-08-04 17:45:14

Answer 7

+2 A:

The simplest solution, in my opinion, is to simply use the statement

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

before executing your statement. This requires no locking hints to achieve your desired output, which are a bit of a "code smell", and can become tough to manage if queries get sufficiently complex.

You were close when you asked if the operation is atomic. It is, but your problem is that it's not sufficiently isolated from other statements that run.

You might ask yourself why this occurs. The answer is scalability. Operations that aren't as isolated don't require the level of locks as those that are more isolated. By being "looser" with the locks, you cut the lock contention on the system and the system becomes more scalable.

When you increase your transaction isolation level to REPEATABLE READ, the database server automatically puts the proper locks on objects to ensure that no other transaction can change what your transaction reads while it's reading it. By putting the UPDLOCK hint on the table, you have essentially manually increased the isolation level of your transaction.

I hope this clears up a bit of conceptual stuff. Oh, and one more thing. If you're using connection pooling, make sure to set the transaction isolation level back to READ COMMITTED (which is usually the default) at the end of your statement, or future statements executed against that pooled connection may also run in the higher isolation level, which could cause your system to scale poorly.

(also see gbn's answer, which has the benefit of using less locks, but for your specific situation should be adequate)

Dave Markle 2010-08-04 18:17:50

Thanks - I take your point about locking hints being smelly. I'll definately just use your suggestion for the general case, but I think I'm gonna go with gbn's more direct approach in this particular case.

Adam 2010-08-04 19:59:37

ansaurus

tags:

views:

answers:

Only inserting a row if it's not already there

related questions