views:

201

answers:

5

I am running the following stored procedure to delete large number of records. I understand that the DELETE statement writes to the transaction log and deleting many rows will make the log grow.

I have looked into other options of creating tables and inserting records to keep and then Truncating the source, this method will not work for me.

How can I make my stored procedure below more efficient while making sure that I keep the transaction log from growing unnecessarily?

CREATE PROCEDURE [dbo].[ClearLog] 
(
  @Age int = 30
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

  -- DELETE ERRORLOG
  WHILE EXISTS ( SELECT [LogId]  FROM [dbo].[Error_Log] WHERE DATEDIFF( dd, [TimeStamp], GETDATE() ) > @Age )
   BEGIN
 SET ROWCOUNT 10000
 DELETE [dbo].[Error_Log] WHERE DATEDIFF( dd, [TimeStamp], GETDATE() ) > @Age

 WAITFOR DELAY '00:00:01'
 SET ROWCOUNT 0
   END
END
+2  A: 

Here is how I would do it:

CREATE PROCEDURE [dbo].[ClearLog] (  
@Age int = 30)
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @d DATETIME
     , @batch INT;
    SET @batch = 10000;
    SET @d = DATEADD( dd, -@Age, GETDATE() )
    WHILE (1=1)
    BEGIN
     DELETE TOP (@batch) [dbo].[Error_Log]  
     WHERE [Timestamp] < @d;
     IF (0 = @@ROWCOUNT)
      BREAK
    END
END
  • Make the Tiemstamp comparison SARGable
  • Separate the GETDATE() at the start of batch to produce a consistent run (otherwise it can block in an infinite loop as new records 'age' as the old ones are being deleted).
  • use TOP instead of SET ROWCOUNT (deprecated: Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server.)
  • check @@ROWCOUNT to break the loop instead of redundant SELECT
Remus Rusanu
I just tested this on 9 million rows, deleting 1000 at a time. It worked just fine.
Coov
For anyone not familiar with the term SARGable - the Timestamp field should be indexed. Also, this approach is probably only appropriate with Simple Recovery mode where the transaction log is emptied following each delete.
Scott Munro
+1  A: 

Assuming you have the option of rebuilding the error log table on a partition scheme one option would be to partition the table on date and swap out the partitions. Do a google search for 'alter table switch partition' to dig a bit further.

ConcernedOfTunbridgeWells
This will do the trick IF it fits into your architecture.
Philip Kelley
+1  A: 

how about you run it more often, and delete fewer rows each time? Run this every 30 minutes:

CREATE PROCEDURE [dbo].[ClearLog] 
(
  @Age int = 30
)
AS
BEGIN
    SET NOCOUNT ON;
    SET ROWCOUNT 10000 --I assume you are on an old version of SQL Server and can't use TOP
    DELETE dbo.Error_Log Where Timestamp>GETDATE()-@Age
    WAITFOR DELAY '00:00:01' --why???
    SET ROWCOUNT 0
END

the way it handles the dates will not truncate time, and you will only delete 30 minutes worth of data each time.

KM
Another good trick--can you space out when the deletes occur and so minimize their effect?
Philip Kelley
@Philip Kelley, that is the idea with "every 30 minutes", yet still limiting to 10000, this will spread out the load with the other "real" user transactions
KM
ah. That would have the procedure "being run" all the time, wouldn't it? I'd recommend moving the timing out to a SQL Agent job that, every 30 minutes, calls the procedure to drop the rows.
Philip Kelley
A: 

A solution I have used in the past was to temporarily set the recovery model to "Bulk Logged", then back to "Full" at the end of the stored procedure:

DECLARE @dbName NVARCHAR(128);
SELECT @dbName = DB_NAME();

EXEC('ALTER DATABASE ' + @dbName + ' SET RECOVERY BULK_LOGGED')

WHILE EXISTS (...)
BEGIN
    -- Delete a batch of rows, then WAITFOR here
END

EXEC('ALTER DATABASE ' + @dbName + ' SET RECOVERY FULL')

This will significantly reduce the transaction log consumption for large batches. I don't like that it sets the recovery model for the whole database (not just for this session), but it's the best solution I could find.

ckarras
+1  A: 

If your database is in FULL recovery mode, the only way to minimize the impact of your delete statements is to "space them out" -- only delete so many during a "transaction interval". For example, if you do t-log backups every hour, only delete, say, 20,000 rows per hour. That may not drop all you need all at once, but will things even out after 24 hours, or after a week?

If your database is in SIMPLE or BULK_LOGGED mode, breaking the deletes into chunks should do it. But, since you're already doing that, I'd have to guess your database is in FULL recover mode. (That, or the connection calling the procedure may be part of a transaction.)

Philip Kelley