views:

495

answers:

5

I am running an archive script which deletes rows from a large (~50m record DB) based on the date they were entered. The date field is the clustered index on the table, and thus what I'm applying my conditional statement to.

I am running this delete in a while loop, trying anything from 1000 to 100,000 records in a batch. Regardless of batch size, it is surprisingly slow; something like 10,000 records getting deleted a minute. Looking at the execution plan, there is a lot of time spent on "Index Delete"s. There are about 15 fields in the table, and roughly 10 of them have some sort of index on them. Is there any way to get around this issue? I'm not even sure why it takes so long to do each index delete, can someone shed some light on exactly whats happening here? This is a sample of my execution plan:

alt text

(The Sequence points to the Delete command)

This database is live and is getting inserted into often, which is why I'm hesitant to use the copy and truncate method of trimming the size. Is there any other options I'm missing here?

+1  A: 

More of a workaround, but can you add an IsDeleted flag to the table and update that to 1 rather than deleting the rows? You will need to modify your SELECTs and UPDATEs to use this flag.

Then you can schedule deletion or archiving of these records for off-hours.

RedFilter
Well, I intend on running this script regularly in the off hours daily to keep the database trim (it deletes any record older than 2 years), but the initial run is so slow it would take something like 4 hours to complete currently, which is more than the powers-that-be want to have the server tied up. Thanks for the suggestion though!
Kevin
In that case just delete smaller batches at a time (e.g., 1,000) so that there is no perceptible impact on server load from the end user perspective, and repeat this with a delay of 30 - 60 seconds between each loop. Then just let it run until it is done. Might take a week or two, but should get the job done.
RedFilter
+1  A: 

It would take some work to implement it given this is in production, but if you are on SQL Server 2005 / 2008 you should investigate and convert the table to being partitioned, then the removal of old data can be achieved extremely quickly. It is designed for a 'rolling window' type effect and prevents large scale deletes tieing up a table / process.

Unfortunately with the table in production, migrating it across to this technique will take some T-SQL coding, knowledge and a weekend to upgrade / migrate it. Once in place though any existing selects and inserts will work against it seamlessly, the partition maintenance and addition / removal is where you need the t-sql to control the process.

Andrew
+2  A: 

I second the suggestion that @NickLarsen made in a comment. Find out if you have unused indexes and drop them. This could reduce the overhead of those index-deletes, which might be enough of an improvement to make the operation more timely.

Another more radical strategy is to drop all the indexes, perform your deletes, and then quickly recreate the indexes for the now smaller data set. This doesn't necessarily interrupt service, but it could probably make queries a lot slower in the meantime. Though I am not a Microsoft SQL Server expert, so you should take my advice on this strategy with a grain of salt.

Bill Karwin
+1  A: 

Assume for each record in the table there are 5 index records.

Now each delete is in essence 5 operations.

Add to that, you have a clustered index. Notice the clustered index delete time is huge? (10x) longer than the other indexes? This is because your data is being reorganized with every record deleted.

I would suggest dropping at least that index, doing a mass delete, than reapplying. Index operations on delete and insert are inherently costly. A single rebuild is likely a lot faster.

Regards, Chris

Chris Kannon
+2  A: 

Deleting 10k records from a clustered index + 5 non clustered ones should definetely not take 1 minute. Sounds like you have a really really slow IO subsytem. What are the values for:

  • Avg. Disk sec/Write
  • Avg. Disk sec/Read
  • Avg. Disk Write Queue Length
  • Avg. Disk Read Queue Length

On each drive involved in the operation (including the Log ones!). If you placed indexes in separate filegroups and allocated each filegroup to its own LUN or own disk, then you can identify which indexes are more problematic. Also, the log flush may be a major bottleneck. SQL Server doesn't have much control here, is all in your own hands how to speed things up. that time is not spent in CPU cycles, is spent waiting for IO to complete and you need an IO subsystem calibrated for the load you demand.

To reduce the IO load you should look into making indexes narrower. Primarily, make sure the clustered index is the narrowest possible that works. Then, make sure the nonclustered indexes don't include sporious unused large columns (I've seen that...). A major gain may be had by enabling page compression. And ultimately, inspect index usage stats in sys.dm_db_index_usage_stats and see if any index is good for the axe.

If you can't reduce the IO load much, you should try to split it. Add filegroups to the database, move large indexes on separate filegroups, place the filegroups on separate IO paths (distinct spindles).

For future regular delete operations, the best alternative is to use partition switching, have all indexes aligned with the clustered index partitioning and when the time is due, just drop the last partition for a lightning fast deletion.

Remus Rusanu
I am fairly confident the hardware is more than capable. We have separate Intel SSD's for the log and data, and another for the OS. It has dual Xeon quad core processors and 16Gb DDR3 memory. We ended up just running this over the course of the weekend to clean up the ~25m rows. Now we'll run it nightly to keep the DB nice and clean, and it should only take a minute or two.
Kevin
Are you so confident that you're not even going to measure?
Remus Rusanu