views:

188

answers:

5

I have a table that contains log entries for a program I'm writing. I'm looking for ideas on an SQL query (I'm using SQL Server Express 2005) that will keep the newest X number of records, and delete the rest. I have a datetime column that is a timestamp for the log entry.

I figure something like the following would work, but I'm not sure of the performance with the IN clause for larger numbers of records. Performance isn't critical, but I might as well do the best I can the first time.

DELETE FROM MyTable WHERE PrimaryKey NOT IN
(SELECT TOP 10,000 PrimaryKey FROM MyTable ORDER BY TimeStamp DESC)

I should mention that this query will run 3-4 times a day (as part of another process), so the number of records that will be deleted with each query will be small in comparison to the number of records that will be kept.

A: 

The query you have is about as efficient as it gets, and is readable.

NOT IN and NOT EXISTS are more efficient than LEFT JOIN/IS NULL, but only because both columns can never be null. You can read this link for a more in-depth comparison.

OMG Ponies
A: 

Try this:

DECLARE @X int
SELECT @X=COUNT(*) FROM MyTable 
SET @X=@X-10000

DELETE MyTable 
WHERE PrimaryKey IN (SELECT TOP(@x) PrimaryKey 
                     FROM MyTable 
                     ORDER BY TimeStamp ASC
                    ) 

kind of depends on if you are deleting fewer than 10,000 rows, if so this might run faster, as it identifies the rows to delete, not the rows to keep.

KM
A: 

DELETE FROM MyTable WHERE TimeStamp < (SELECT min(TimeStamp) FROM (SELECT TOP 10,000 TimeStamp FROM MyTable ORDER BY TimeStamp DESC))

or

DELETE FROM MyTable WHERE TimeStamp < (SELECT min(TimeStamp) FROM MyTable WHERE PrimaryKey IN (SELECT TOP 10,000 TimeStamp FROM MyTable ORDER BY TimeStamp DESC))

Not sure if these are improvement as far as efficiency though.

aterimperator
A: 

This depends on your scenario (whether it's feasible for you) and how many rows you have, but there is a potentially far more optimal approach.

  1. Create a new copy of the log table with a new name
  2. Insert into the new table, the most recent 10,000 records from the original table
  3. Drop the original table (or rename)
  4. Rename the new table, to the proper name

This obviously requires more thought than just deleting rows (e.g. if the table has an IDENTITY column this needs to be set on the new table etc). But if you have a large table it would be more efficient to copy 10,000 rows to a new table then drop the original table, than trying to delete millions of rows to leave just 10,000.

AdaTheDev
A: 

Try this, uses a CTE to get the row ordinal number, and then only deletes X number of rows at a time. You can alter this variable to suit your server.

Adding ReadPast table hint should prevent locking.

:

DECLARE @numberToDelete INT;
DECLARE @ROWSTOKEEP INT;
SET @ROWSTOKEEP = 50000;
SET @numberToDelete =1000;

WHILE 1=1
BEGIN
    WITH ROWSTODELETE AS
    (
        SELECT ROW_NUMBER() OVER(ORDER BY dtsTimeStamp DESC) rn,
            *
        FROM MyTable 

    )
    DELETE TOP (@numberToDelete) FROM ROWSTODELETE WITH(READPAST)
    WHERE rn>@ROWSTOKEEP;

    IF @@ROWCOUNT=0
        BREAK;
END;
Simmo