views:

2395

answers:

12

Deletes on sql server are sometimes slow and I've been often in need to optimize them in order to diminish the needed time. I've been googleing a bit looking for tips on how to do that, and I've found diverse suggestions. I'd like to know your favorite and most effective techinques to tame the delete beast, and how and why they work.

until now:

  • be sure foreign keys have indexes

  • be sure the where conditions are indexed

  • use of WITH ROWLOCK

  • destroy unused indexes, delete, rebuild the indexes

now, your turn.

+2  A: 

I have much more experience with Oracle, but very likely the same applies to SQL Server as well:

  • when deleting a large number of rows, issue a table lock, so the database doesn't have to do lots of row locks
  • if the table you delete from is referenced by other tables, make sure those other tables have indexes on the foreign key column(s) (otherwise the database will do a full table scan for each deleted row on the other table to ensure that deleting the row doesn't violate the foreign key constraint)
ammoQ
+2  A: 

(if the indexes are "unused", why are they there at all?)

One option I've used in the past is to do the work in batches. The crude way would be to use SET ROWCOUNT 20000 (or whatever) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (@@ROWCOUNT = 0).

This might help reduce the impact upon other systems.

Marc Gravell
"unused" in the delete
pomarc
But there are ususally more things going on than just the delete... I guess it may help, but you've have to check that it doesn't (overall) make the system worse...
Marc Gravell
Do not delete indexes just because they are unused in the delete! Other people are using the database for other things!
HLGEM
I should have pointed out that destroy-delete-rebuild should be used only when lenghty deletes are done while the db is not used, such as by night, during batch operations, when the db is an enterprise db used only at daytime. obviously destroying indexes on a live and used db is not a good idea.
pomarc
+2  A: 

To be honest, deleting a million rows from a table scales just as badly as inserting or updating a million rows. It's the size of the rowset that's the problem, and there's not much you can do about that.

My suggestions:

  • Make sure that the table has a primary key and clustered index (this is vital for all operations).
  • Make sure that the clustered index is such that minimal page re-organisation would occur if a large block of rows were to be deleted.
  • Make sure that your selection criteria are SARGable.
  • Make sure that all your foreign key constraints are currently trusted.
Christian Hayter
SARGable:In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able. (Wikipedia)
pomarc
+2  A: 

I'll add another one to this:

Make sure your transaction isolation level and database options are set appropriately. If your SQL server is set not to use row versioning, or you're using an isolation level on other queries where you will wait for the rows to be deleted, you could be setting yourself up for some very poor performance while the operation is happening.

Dave Markle
+1  A: 

On very large tables where you have a very specific set of criteria for deletes, you could also partition the table, switch out the partition, and then process the deletions.

The SQLCAT team has been using this technique on really really large volumes of data. I found some references to it here but I'll try and find something more definitive.

RobS
+2  A: 

If you have lots of foreign key tables, start at the bottom of the chain and work up. The final delete will go faster and block less things if there are no child records to cascade delete (which I would NOT turn on if I had a large number fo child tables as it will kill performance).

Delete in batches.

If you have foreign key tables that are no longer being used (you'd be surprised how often production databses end up with old tables nobody will get rid of), get rid of them or at least break the FK/PK connection. No sense cheking a table for records if it isn't being used.

Don't delete - mark records as delted and then exclude marked records from all queries. This is best set up at the time of database design. A lot of people use this because it is also the best fastest way to get back records accidentlally deleted. But it is a lot of work to set up in an already existing system.

HLGEM
+1  A: 

There are deletes and then there are deletes. If you are aging out data as part of a trim job, you will hopefully be able to delete contiguous blocks of rows by clustered key. If you have to age out data from a high volume table that is not contiguous it is very very painful.

+5  A: 

The following article, Fast Ordered Delete Operations may be of interest to you.

Performing fast SQL Server delete operations

The solution focuses on utilising a view in order to simplify the execution plan produced for a batched delete operation. This is achieved by referencing the given table once, rather than twice which in turn reduces the amount of I/O required.

John Sansom
+1  A: 

If it is true that UPDATES are faster than DELETES, you could add a status column called DELETED and filter on it in your selects. Then run a proc at night that does the actual deletes.

Bill
+1  A: 

I wonder if it's time for garbage-collecting databases? You mark a row for deletion and the server deletes it later during a sweep. You wouldn't want this for every delete - because sometimes a row must go now - but it would be handy on occasion.

quillbreaker
+2  A: 

The problem is you haven't defined your conditions enough. I.e. what exactly are you optimizing?

For example, is the system down for nightly maintenance and no users are on the system? And are you deleting a large % of the database?

If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename. If deleting a small %, you likely want to batch things in as large batches as your log space allows. It entirely depends on your database, but dropping indexes for the duration of the rebuild may hurt or help -- if even possible due to being "offline".

If you're online, what's the likelihood your deletes are conflicting with user activity (and is user activity predominantly read, update, or what)? Or, are you trying to optimize for user experience or speed of getting your query done? If you're deleting from a table that's frequently updated by other users, you need to batch but with smaller batch sizes. Even if you do something like a table lock to enforce isolation, that doesn't do much good if your delete statement takes an hour.

When you define your conditions better, you can pick one of the other answers here. I like the link in Rob Sanders' post for batching things.

Matt
thanks matt. well, my question is quite general, i've had slow deletes in various and different occasions, and this was meant to be a way to gather the tips people could share on the issue.
pomarc
A: 

Do you have foreign keys with referential integrity activated? Do you have triggers active?

Cătălin Pitiș