views:

122

answers:

6

I have an SP in SQL Server which runs hundreds of times a minute, and needs to check incoming traffic against a database. At the moment it does the following

INSERT INTO table
SELECT @value1,@value2 WHERE NOT EXISTS 
(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2);

However, I could also go with

IF NOT EXISTS(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2)    
   INSERT INTO table (value1,value2) VALUES (@value1,@value2);

Which would be faster? I get the feeling there's not much difference between them but I'm historically not very good at TSQL... =/

UPDATE: Whoops... meant to state that the EXISTS uses more than 1 value to find if a record exists, so a unique constraint won't work. Edited the sample to reflect that...

A: 

If you want the values to be unique, why not just create a unique constraint on the value, do an INSERT without SELECT and gracefully handle constraint violation error?

That'd be faster than either of these approaches.

Also, your first approach doesn't work - by the time you get to select, you already inserted the value so select will obviously find what you just inserted.

DVK
I believe you are wrong in your last paragraph. The `WHERE` clause binds to the `SELECT`, which is indeed executed "before" the insert.
Jørn Schou-Rode
well I guess I made my first StackOverflow mistake then, I simplified those examples! Its actually checking for a record with TWO values so the records aren't unique... I'll edit it to reflect that. I can 100% guarantee the first method works, I've tested it plenty of times.
roryok
Adding a `unique constraint` spanning multiple columns should be no problem.
Jørn Schou-Rode
really? Cool! I'll look into that
roryok
A: 

If I had to guess, I would guess the second option would be faster. sql server would not have to do any kind of setup for the insert if the exists fails, whereas in the first one, it might look up some table and field names and prepare for an insert that never happens. However, I would try it in the query analyzer and see what the plan says.

Ray
On the other hand, the first option is a single batch statement, while the second is several statements in a more procedural style. An RDBMS is usually very efficient with batch statements, less so with procedural/imperative code. That said, I have no idea which of the two statements at hand performs the best :)
Jørn Schou-Rode
+1  A: 

just do it, and ignore any error (assumes an unique constraint on Value)...

BEGIN TRY
    INSERT INTO Table (value) VALUES (@value);
END TRY
BEGIN CATCH
    PRINT 'it was already in there!'
END CATCH

Since this runs hundreds of times a minute, locking hints should be added to the SELECTs and a transaction to avoid a race condition

(SELECT * FROM Table WITH (UPDLOCK, HOLDLOCK)  WHERE value = @value);

however, my proposed idea of just INSERT and ignore any duplicate constraint error would avoid a race condition as well.

KM
Thanks KM, I'm using (NOLOCK) on the select at the moment... should have mentioned that too!
roryok
@roryok, if two calls are made to this SQL at the same time (you say that it is run hundreds of times a minute) that have the same Values they could possibly both select no existing row and both attempt to insert and thus create duplicates, defeating your intentions
KM
Note that the exception will doom the transaction if raised inside the trigger.
Quassnoi
+2  A: 

After adding a gazillion comments on this question and its answers, I will have my own go on answering it.

I would not expect any major difference in performance between the two proposed proposed in the original question. On one hand, as pointed out by Ray, the second approach might save you from doing some preparations for the insert, but on the other hand, an RDBMS usually performs best with batch statements, as in the first solution.

KM and DVK suggest adding a UNIQUE constraint, which will make the uniqueness test implicit, but will require you to add some kind of error handling around your INSERT statement. I have a hard time spotting why this should add any additional performance, assuming that you already have an index covering the two columns. If you do not have such index, add it, and reconsider your need for more performance.

Whether the uniqueness check is performed explicit or implicit should not matter AFAIK. If anything is gained by having the check done "inside" the stomach of the DBMS, that gain might just be eaten up by overhead associated with raising and handling errors when duplicates exists.


The bottom line: Assuming an index is already in place, if you still find yourself lusting for performance, my recommendation is that you perform empirical tests on the three suggested solutions. Cook up a small program that simulates the expected input data, and blow each of the three solutions away with a few billion rows, including a plausible amount of duplicates. do this, be sure to post your results :-)

Jørn Schou-Rode
Thanks, currently there's an index on each column but not a multiple column index covering both. I think I'll go with that and add a unique constraint. I didn't know I could add a unique constraint spanning multiple columns.
roryok
@roryok: in fact, you can just change your index to a `UNIQUE INDEX`, it will be the same.
Quassnoi
Thanks guys! First time posting on StackOverflow and I love it!
roryok
+2  A: 

In a hardly concurrent environment, a concurrent INSERT can happen in between IF NOT EXISTS and INSERT in your second query.

Your first query will place the shared locks on the record it examines, which will not be lifted until the end of the query, so it will be impossible to insert a new record until the query is running.

However, you should not rely solely on this behavior. Place an additional UNIQUE constraint on the value.

It will not only make the database more consistent, but will create an index which will make the first query more fast.

Quassnoi
+1 for mentioning the risk of a race condition.
Jørn Schou-Rode
+3  A: 

Both variants are incorrect. You will insert pairs of duplicate @value1, @value2, guaranteed.

The correct way to handle this is to enforce a unique constraint on two columns and to always INSERT and handle the constraint violation:

ALTER TABLE Table ADD CONSTRAINT uniqueValue1Value UNIQUE (value1, values2);

and to insert:

BEGIN TRY
   INSERT INTO Table (value1, value2) VALUES (@value1, @value2);
END TRY
BEGIN CATCH
   DECLARE @error_number int, @error_message NVARCHAR(4000), @xact_state INT;
   SET @error_number = ERROR_NUMBER();
   SET @error_message = ERROR_MESSAGE();
   SET @xact_state = XACT_STATE();
   IF (@xact_state = -1)
   BEGIN
     ROLLBACK TRANSACTION;
   END
   IF (@error_number != 2627) /* 2627 is ' Cannot insert duplicate key in object ...' */
   BEGIN
      RAISERROR(N'Error inserting into Table: %i %s', 16,1, @errror_number, @error_message);
   END
ENd CATCH

While these may seem complicated, one has to factor in a little detail named correctness. This is by far simpler when compared with a lock hints based solution. This is also the most performant solution: does only one seek. All other solutions need at least two seeks (one to validate that it can be inserted, one to insert).

Remus Rusanu
Thanks Remus. What transaction will that ROLLBACK TRANSACTION affect? The entire SP is enclosed in a transaction statement which includes a few more of these insert statements, I don't want to end the whole thing if the value is already there...
roryok
When XACT_STATE() is -1 you have no choice, it means the transcation is doomed and it *must* rollback. A unique constraint violation will not cause a rollback, this will happen only if a more severe error is encountered, like runnig out of disk space for instance.
Remus Rusanu