views:

192

answers:

6

I have a large SQL Server database with a table at about 45 million records. I am archiving this table, and need to remove all entries greater than two years ago. I have the inserting into my archive table working fine, but I'm having issues with efficiency when deleting.

My problem lies within the indexes currently on the table. I would like to delete (and archival insert) in 1000 record chunks. To do this, I need to determine the "top" 1000 records fulfilling the requirement (greater than two years old). The DateTime stamp on the row is a clustered index, so this is great for grabbing the rows. However SQL 2000 does not allow DELETE TOP 1000.... so I need to do something like:

DELETE FROM <table> WHERE [UniqueID] IN 
(SELECT TOP 1000 [UniqueID] FROM <table> WHERE [DateTime] < @TwoYearsAgo)

This would work great, if UniqueID was indexed. Since it is not, this takes a very long time (it is scanning the table for each of the 1000 records to be deleted). There are no other indexes on the table that uniquely identify the records. I am told it would be too costly to compute an index on UniqueID, as this is a live DB. Can anyone point out a way to optimize this query? Thanks!

+5  A: 

How about rewriting the query?

SET ROWCOUNT 1000
DELETE FROM <table> WHERE [DateTime] < @TwoYearsAgo

See documentation on SET ROWCOUNT (Transact-SQL).

Also note that per the documentation for DELETE, it supports the TOP clause, but that is apparently new for SQL Server 2005 and up. I'm saying this since it sounds like it isn't supported on your database server, but have you actually tried using it? I don't have access to SQL Server 2000 documentation so I'm unsure if it is supported on that version. It very well might not be.

DELETE TOP (1000) FROM <table> WHERE [DateTime] < @TwoYearsAgo

Note the difference from the way TOP on select can be written, without the parenthesis. For UPDATE, DELETE and INSERT, the expression must be parenthesized, even if it's only a constant number like above.

Lasse V. Karlsen
I am also pushing to move to Server 2008, but we are most likely going to trim the database before we move it to a new instance.
Kevin
Yes, I have tried both with and without parentheses, to no avail.
Kevin
+2  A: 

You could use SET ROWCOUNT:

SET ROWCOUNT 1000
DELETE FROM <table> WHERE [DateTime] < @TwoYearsAgo
AdaTheDev
I had seen this suggestion somewhere, but I was under the impression this was dangerous in a live database. I will look into it more, thank you for the suggestion.
Kevin
A: 

you can also do

DELETE TOP(1000) FROM <table> WHERE [DateTime] < @TwoYearsAgo

God only knows why they use top(x) for delete and top x for select, most people don't even seem to know about this feature!

edit: Apparently its 2005+ so you should probably ignore this.

Paul Creasey
+1  A: 

I had to do something similar a while back -- make lightweight insert and delete to move old records to an archive table. Although counterintuitive, the fastest and least impactful solution I found was:

  1. Make a small #temp table with the values of IDs for the top (x) rows. If ID really can't be indexed in your scenario, you might use date AND ID instead, so the combination of the two can use an index.

  2. begin tran

  3. Insert into archive table where ID and DATE in ( #temp )

  4. Delete from main table where ID and DATE in ( #temp )

  5. commit

  6. Truncate #temp

  7. Repeat

Having the temp table to stage the row identifiers is more total work than a straight delete, but makes the process very lightweight in cases where you want to just chip away a little at a time without blocking.

Also I agree with Lasse - can't see the point of a unique id with no index, and therefore no constraint, to enforce it.

onupdatecascade
I tried something similar to this with a locally declared temp table, but without having a unique identifier indexed, it didn't help much. I'll try using both the date and uniqueID, see if that gets me anywhere. Thanks!
Kevin
+7  A: 

You can delete a subquery:

DELETE <table> FROM (
  SELECT TOP 1000 *  
  FROM <table>
  WHERE [DateTime] < @TwoYearsAgo);

See the example E: at SQL 2000 DELETE Syntax. This is recommended over the SET ROWCOUNT approach. In SQL 2005 and later you can specify directly the TOP in DELETE.

Remus Rusanu
A: 

I wonder whether you must stick with the 1000 record chunk requirement. If it is there for the reason of server load and kind of arbitrary, you may want to try the following, since you already have a clustered index on [DateTime]:

DELETE FROM <table> 
WHERE [DateTime] < @TwoYearsAgo 
and [DateTime] < (select dateadd(day, 1, min([DateTime])) from <table>)
hongliang