ansaurus

Question

Most Efficient (Fast) T-SQL DELETE For Many Rows?

Answer 1

+3 A:

Maybe:

DELETE FROM MyRecords
WHERE  tag IN (1, 2, 555) -- build a list
OR longTag IN ('LongTag1')

I suspect indexes would help your deletes but drastically slow your inserts, so I wouldn't play with that too much. But then my intuition isn't exactly perfect yet, you might be able to tune FillFactor or other items to get around that issue, and the one thing I do know for sure it that you really want to profile both anyway.

Another option is to load new inserts into a temp table (named something like InputQueue), and then join the temp table on MyRecords to handle filtering updates. This would also make it easy to do the update in two steps: you could delete Tags and longTags as separate operations and that might turn out to be much more efficient.

Joel Coehoorn 2009-04-03 16:05:42

exists against a temp table will be much faster than in for non-trivial lists

annakata 2009-04-03 16:12:31

@Joel: Is this faster than what I'm doing?

John Dibling 2009-04-03 16:19:07

@annakata: what temp table? Can you elaborate please?

John Dibling 2009-04-03 16:19:54

I can't _know_ if it's faster than what you're doing. I don't have your data or your system. I can only conjecture. As annakata points out, the 2nd idea is _probably_ going to be an improvement, but I can't make guarantees. So far for all answers I like Tomalak's idea best.

Joel Coehoorn 2009-04-03 16:29:00

Yeah I'm just speaking hypothetically as indeed we don't have enough data - the temp table would be one of your own creation, it's just that IN is an expensive thing to iterate through where exists on a temp is fast because it returns as soon as it finds a "true"

annakata 2009-04-03 16:30:58

Missing 555 from the IN clause

Joe Philllips 2009-04-03 17:36:18

Answer 2

+2 A:

Seems that your table is not indexed on (tag) and (longTag)

Build two indexes: one on (tag), one on (longTag)

If you are planning to delete a really big number of records, then declare two table variables, fill them with values and delete like this:

DECLARE @tag TABLE (id INT);
DECLARE @longTag TABLE (id VARCHAR(50));

INSERT
INTO  @tag
VALUES (`tag1`)

INSERT
INTO  @tag
VALUES (`tag2`)

/* ... */

INSERT INTO @longTag
VALUES ('LongTag1')

/* ... */


DELETE
FROM    MyRecords r
WHERE   r.tag IN (SELECT * FROM @tag)
        OR r.longTag IN (SELECT * FROM @longTag)

You may also try to perform a two-pass DELETE:

DELETE
FROM    MyRecords r
WHERE   r.tag IN (SELECT * FROM @tag)

DELETE
FROM    MyRecords r
WHERE   r.longTag IN (SELECT * FROM @longTag)

and see what statements runs longer, to see if there's an issue with the indexes.

Quassnoi 2009-04-03 16:06:22

I do have indexes for those two columns; although admittedly they may not be designed optimally.

John Dibling 2009-04-03 16:18:15

Answer 3

+2 A:

Using ORs may cause a table scan - can you break it up into four statements? Wrapping each in a transaction may also speed things up.

DELETE from MyRecords
WHERE tag = 1

DELETE from MyRecords
WHERE tag = 2

DELETE from MyRecords
WHERE tag = 555

DELETE from MyRecords
WHERE longTag = 'LongTag1'

DJ 2009-04-03 16:10:16

There could actually be 1000-2000 deletes in the real data. If I send 2000 DELETEs is that optimal?

John Dibling 2009-04-03 17:12:39

You mean you have 1000-2000 different tags?

DJ 2009-04-03 17:13:23

Yes, 1000-2000 different tags.

John Dibling 2009-04-03 18:52:19

Answer 4

+5 A:

I think splitting the giant DELETE statement into 2 DELETE may help.

1 DELETE to deal with tag and a separate DELETE to deal with longTag. This would help SQL server to choose to use indexes efficiently.

Of course you can still fire the 2 DELETE statements in 1 DB round-trip.

Hope this helps

Canton 2009-04-03 16:11:19

When you say "in 1 DB round-trip", I think the point is that you can wrap it in a transaction so that you have complete rollback capability.

Rob Garrison 2009-04-03 20:13:31

I did mean 1 DB round-trip. As John mentioned the high frequency and volume of DELETE, cutting 50% round-trip should help.

Canton 2009-04-04 01:38:01

/* 2 round-trip */string sql1 = "DELETE FROM MyRecords WHERE tag IN (1,2,55)";string sql2 = "DELETE FROM MyRecords WHERE longTag IN ('LongTag1')";/* 1 round-trip */string sql = "DELETE FROM MyRecords WHERE tag IN (1,2,55); DELETE FROM MyRecords WHERE longTag IN ('LongTag1')";

Canton 2009-04-04 01:40:36

oops.. sorry I can't format the code in comments...But I hope it shows my intention.

Canton 2009-04-04 01:41:23

Answer 5

+3 A:

Something like this could streamline the process (you would simply INSERT the rows, no matter if they already exist - no need for an up-front DELETE statement):

CREATE TRIGGER dbo.TR_MyTable_Merge 
   ON  dbo.MyTable 
   INSTEAD OF INSERT
AS 
BEGIN
  SET NOCOUNT ON;

  BEGIN TRANSACTION

  DELETE MyTable 
  FROM   MyTable t INNER JOIN inserted i ON t.tag = i.tag 

  DELETE MyTable 
  FROM   MyTable t INNER JOIN inserted i ON t.longTag = i.longTag

  INSERT MyTable 
  SELECT * FROM inserted

  COMMIT TRANSACTION

  SET NOCOUNT OFF;
END

EDIT: Previously Combined DELETE statement broken up into two separate statements, to enable optimal index use.

Not using DELETE at all, but rather UPDATEing the affected/duplicate rows in place will be easier on the indexes.

Tomalak 2009-04-03 16:15:13

I like this but what about using a try/catch and instead of doing a delete, do an update with the insert occurring in the catch block when the update fails (for a new record).

tvanfosson 2009-04-03 16:19:40

Hm... Possible. I'll look into writing an alternative.

Tomalak 2009-04-03 16:21:57

I just added a comment to my OP about what the normal case will be in my system. The normal case will be that a row will not already exist, so in the vast majority of cases the UPDATE will fail. How does this impact the optimal design?

John Dibling 2009-04-03 16:26:42

I'm not sure if I get your question. The above trigger would rid you of the need to do any manual delete up-front. You would simply insert rows to the table, and the trigger takes care of them should they exist already.

Tomalak 2009-04-03 16:38:57

Answer 6

+3 A:

Check out this video which showcases how to do a 'nibbling' delete. The process works well and can definitely reduce the locking/collision problems you're seeing:

http://www.sqlservervideos.com/video/nibbling-deletes

Michael K Campbell 2009-04-03 16:19:23

Answer 7

+4 A:

An OR (or an in) almost works as if each OR operand is a different query. That is, it turns into a table scan, and for each row, the database has to test each OR operand as a predicate, until it finds a match or runs out of operands.

The only reason to package this up is to make it one logical unit of work. You could also wrap a bunch of deletes in a transaction, and only commit when all finish successfully.

Quassnoi makes an interesting suggestion -- to use a table --, but since he then uses INs and ORs, it comes out the same.

But try this.

Create a new table that mirrors your real table. Call it u_real_table. Index it on tag and longTag.

Put all your incoming data into u_real_table.

Now, when you're ready to do your bulk thing, instead join the mirror table o the real table on tag. From the real table, delete all the tag'd rows in the u_real_table:

delete real_table from real_table a 
   join u_real_table b on (a.tag = b.tag);
insert into real_table select * 
   from u_real_table where tag is not null;

See what we did here? Since we're joining only on tag, there's a greater chance the tag index can be used.

First we deleted everything new, then we inserted the new replacements. We could also do an update here. Which is faster depends on your table structure and its indices.

We didn't have to write a script to do it, we just had to have inserted the records in u_real_table.

Now we do the same thing for longTags:

delete real_table from real_table a 
   join u_real_table b on (a.longTag = b.longTag);
insert into real_table select * 
   from u_real_table where longTag is not null;

Finally, we clear out u_real_table:

delete from u_real_table;

Obviously, we wrap the whole each delete/insert pair in a transaction, so that the delete only becomes real when the subsequent insert succeeds, and then we wrap the whole thing in another transaction. Because it is a logical unit of work.

This method reduces your manual work, reduces the possibility of a manual error, and has some chance of speeding up the deletes.

Note that this relies on missing tags and longTags correctly being null, not zero or the empty string.

tpdi 2009-04-03 16:26:22

This suggestion is very interesting, thank you!

John Dibling 2009-04-03 17:17:12

+1, imo the best suggestion. You can also do a INSERT / WHERE NOT EXISTS + an UPDATE with this table, instead of DELETE + INSERT

Brimstedt 2009-04-04 17:47:15

I agree this is the best suggestion. I am implementing this now in my production code.

John Dibling 2009-04-06 22:58:28

Just so you know, I did update my server code to implement the DELETEs as outlined in this thread, and it runs about 4x faster than my original code! Thanks very much for the suggestion.

John Dibling 2009-04-20 16:10:55

Sweet. Glad it's working for you.

tpdi 2009-04-20 18:54:20

Answer 8

+1 A:

Indexing:

Consider using an indexed persisted computed column for longTag which stores a checksum of longTag. Instead of indexing 'LongTag1', you index a 4-byte int value (86939596).

Then your look-ups are [hopefully*] faster, and you just have to include the longTag value in the query/delete. Your code would be slightly more complex, but the indexing is likely to be much more efficient.

* Requires testing

Rob Garrison 2009-04-03 20:32:57

How do I do the checksum?

John Dibling 2009-04-04 14:57:04

Something like this:CREATE TABLE ... longTag varchar(32), longTagCksm int AS CHECKSUM(longTag) PERSISTED NOT NULL,...For more on indexed persisted computed columns: http://www.databasejournal.com/features/mssql/article.php/3722696

Rob Garrison 2009-04-06 16:25:49

ansaurus

tags:

views:

answers:

Most Efficient (Fast) T-SQL DELETE For Many Rows?

related questions