views:

619

answers:

12

There is quite often situation when you need to execute INSERT, UPDATE or DELETE statement based on some condition. And my question is whether the affect on the performance of the query add IF EXISTS before the command.

Example

IF EXISTS(SELECT 1 FROM Contacs WHERE [Type] = 1)
    UPDATE Contacs SET [Deleted] = 1 WHERE [Type] = 1

What about INSERTs or DELETEs?

+1  A: 

The performance of an IF EXISTS statement:

IF EXISTS(SELECT 1 FROM mytable WHERE someColumn = someValue)

depends on the indexes present to satisfy the query.

Mitch Wheat
+3  A: 

You shouldn't do this in most cases. Depending on your transaction level you have created a race condition, now in your example here it wouldn't matter to much, but the data can be changed from the first select to the update. And all you've done is forced SQL to do more work

The best way to know for sure is to test the two differences and see which one gives you the appropriate performance.

JoshBerke
+1  A: 

There is a slight effect, since you're doing the same check twice, at least in your example:

IF EXISTS(SELECT 1 FROM Contacs WHERE [Type] = 1)

Has to query, see if there are any, if true then:

UPDATE Contacs SET [Deleted] = 1 WHERE [Type] = 1

Has to query, see which ones...same check twice for no reason. Now if the condition you're looking for is indexed it ought to be quick, but for large tables you could see some delay just because you're running the select.

Nick Craver
+1  A: 

Yes this will affect performance (the degree to which performance will be affected will be affected by a number of factors). Effectively you are doing the same query "twice" (in your example). Ask yourself whether or not you need to be this defensive in your query and in what situations would the row not be there? Also, with an update statement the rows affected is probably a better way to determine if anything has been updated.

bleeeah
+3  A: 

IF EXISTS will basically do a SELECT - the same one that UPDATE would.

As such, it will decrease performance - if there's nothing to update, you did the same amount of work (UPDATE would have queried same lack of rows as your select) and if there's something to update, you juet did an un-needed select.

DVK
An alternative is burnall's answer: just do the update first and then check @@rowcount. If it's zero then nothing was updated and you call fall through to an insert.
Seth Petry-Johnson
`IF EXISTS` is **not** a good design for an insert-or-update operation. It has a race condition, even inside a `BEGIN TRAN`/`COMMIT TRAN` block.
Aaronaught
@Aaronaught - good point. Removed that piece.
DVK
+7  A: 

That is not useful for just one update/delete/insert.
Possibly adds performance if several operators after if condition.
In last case better write

update a set .. where ..
if @@rowcount > 0 
begin
    ..
end
+1. I must admit that on first reading of the question, it was not obvious (to me at least) that this was what the OP was referring to. Must be late...
Mitch Wheat
Exactly. Do what will *usually* be the needed action, check the results, then do the alternate case if necessary.
Patrick Karcher
+1 I was going go down this path as well but wasn't clear from his question...Good answer
JoshBerke
This is fine for `UPDATE` and `DELETE` but it might not work at all for `INSERT` if there's a unique index.
Aaronaught
+4  A: 

You should not do it for UPDATE and DELETE, as if there is impact on performance, it is not a positive one.

For INSERT there might be situations where your INSERT will raise an exception (UNIQUE CONSTRAINT violation etc), in which case you might want to prevent it with the IF EXISTS and handle it more gracefully.

van
+1  A: 

This largely repeats the preceding (by time) five (no, six) (no, seven) answers, but:

Yes, the IF EXISTS structure that you have by and large will double the work done by the database. While IF EXISTS will "stop" when it finds the first matching row (it doesn't need to find them all), it's still extra and ultimately pointless effort--for updates and deletes.

  • If no such row(s) exist, IF EXISTS will a full scan (table or index) to determine this.
  • If one or more such rows exist, IF EXISTS will read enough of the table/index to find the first one, and then UPDATE or DELETE will then re-read that the table to find it again and process it -- and it will read "the rest of" the table to see if there are any more to process as well. (Fast enough if properly indexed, but still.)

So either way, you'll end up reading the entire table or index at least once. But, why bother with the IF EXISTS in the first place?

UPDATE Contacs SET [Deleted] = 1 WHERE [Type] = 1 

or the similar DELETE will work fine whether or not there are any rows found to process. No rows, table scanned, nothing modified, you're done; 1+ rows, table scanned, everything that ought to be is modified, done again. One pass, no fuss, no muss, no having to worry about "did the database get changed by another user between my first query and my second query".

INSERT is the situation where it might be useful -- check if the row is present before adding it, to avoid Primary or Unique Key violations. Of course you have to worry about concurrency -- what if someone else is trying to add this row at the same time as you? Wrapping this all into a single INSERT would handle it all in an implicit transaction (remember your ACID properties!):

INSERT Contacs (col1, col2, etc) values (val1, val2, etc) where not exists (select 1 from Contacs where col1 = val1)
IF @@rowcount = 0 then <didn't insert, process accordingly>
Philip Kelley
+2  A: 
AlexKuznetsov
+7  A: 

I'm not completely sure, but I get the impression that this question is really about upsert, which is the following atomic operation:

  • If the row exists in both the source and target, UPDATE the target;
  • If the row only exists in the source, INSERT the row into the target;
  • (Optionally) If the row exists in the target but not the source, DELETE the row from the target.

Developers-turned-DBAs often naïvely write it row-by-row, like this:

-- For each row in source
IF EXISTS(<target_expression>)
    IF @delete_flag = 1
        DELETE <target_expression>
    ELSE
        UPDATE target
        SET <target_columns> = <source_values>
        WHERE <target_expression>
ELSE
    INSERT target (<target_columns>)
    VALUES (<source_values>)

This is just about the worst thing you can do, for several reasons:

  • It has a race condition. The row can disappear between IF EXISTS and the subsequent DELETE or UPDATE.

  • It's wasteful. For every transaction you have an extra operation being performed; maybe it's trivial, but that depends entirely on how well you've indexed.

  • Worst of all - it's following an iterative model, thinking about these problems at the level of a single row. This will have the largest (worst) impact of all on overall performance.

One very minor (and I emphasize minor) optimization is to just attempt the UPDATE anyway; if the row doesn't exist, @@ROWCOUNT will be 0 and you can then "safely" insert:

-- For each row in source
BEGIN TRAN

UPDATE target
SET <target_columns> = <source_values>
WHERE <target_expression>

IF (@@ROWCOUNT = 0)
    INSERT target (<target_columns>)
    VALUES (<source_values>)

COMMIT

Worst-case, this will still perform two operations for every transaction, but at least there's a chance of only performing one, and it also eliminates the race condition (kind of).

But the real issue is that this is still being done for each row in the source.

Before SQL Server 2008, you had to use an awkward 3-stage model to deal with this at the set level (still better than row-by-row):

BEGIN TRAN

INSERT target (<target_columns>)
SELECT <source_columns> FROM source s
WHERE s.id NOT IN (SELECT id FROM target)

UPDATE t SET <target_columns> = <source_columns>
FROM target t
INNER JOIN source s ON t.d = s.id

DELETE t
FROM target t
WHERE t.id NOT IN (SELECT id FROM source)

COMMIT

As I said, performance was pretty lousy on this, but still a lot better than the one-row-at-a-time approach. SQL Server 2008, however, finally introduced MERGE syntax, so now all you have to do is this:

MERGE target
USING source ON target.id = source.id
WHEN MATCHED THEN UPDATE <target_columns> = <source_columns>
WHEN NOT MATCHED THEN INSERT (<target_columns>) VALUES (<source_columns>)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

That's it. One statement. If you're using SQL Server 2008 and need to perform any sequence of INSERT, UPDATE and DELETE depending on whether or not the row already exists - even if it's just one row - there is no excuse not to be using MERGE.

You can even OUTPUT the rows affected by a MERGE into a table variable if you need to find out afterward what was done. Simple, fast, and risk-free. Do it.

Aaronaught
A: 

If you're using mysql you can use this - http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html

charles
A: 
IF EXISTS....UPDATE

Don't do it. It forces two scans/seeks instead of one.

If update doesn't find a match on the WHERE clause, the cost of the update statement is just a seek/scan.

If it does find a match, and if you preface it w/ IF EXISTS, it has to find the same match twice. And in a concurrent environment, what was true for the EXISTS may not be true any longer for the UPDATE.

This is precisely why UPDATE/DELETE/INSERT statements allow a WHERE clause. Use it!

Peter