ansaurus

Question

Move SQL Server data in limited (1000 row) chunks

Answer 1

A:

How about:

INSERT INTO EventsBackups
SELECT TOP 1000 * FROM Events ORDER BY YourKeyField

DELETE Events
WHERE YourKeyField IN (SELECT TOP 1000 YourKeyField FROM Events ORDER BY YourKeyField)

Aaron Alton 2009-05-14 16:27:02

As an aside, this is a perfect case for sliding window partitioning, if you're able to make use of it: http://weblogs.sqlteam.com/dang/archive/2008/08/30/Sliding-Window-Table-Partitioning.aspxIt's a metadata switch, so the entire load could be done in a few seconds at most.

Aaron Alton 2009-05-14 16:32:11

Answer 2

A:

How about don't do it all at once?

INSERT INTO EventsBackups
SELECT * FROM Events WHERE date criteria

Then later,

DELETE FROM Events
SELECT * FROM Events INNER JOIN EventsBackup on Events.ID = EventsBackup.ID

or the equivalent.

Nothing you've said so far suggests you need a transaction.

John Saunders 2009-05-14 16:27:55

It's too resource intensive to do a massive insert like that in a very active table. It needs to be "chunked" to prevent large resource waits.

Aaron Alton 2009-05-14 16:31:26

But it's the Backup table that will be locked, not the Events table. Therefore is locking a problem? Then later you can perform the deletes in chunks if they're in your backup.

Robin Day 2009-05-14 16:33:20

I'm using the transaction so I can rollback the insert if the delete fails. I don't want to have any records appear in the archive table that are still in the live table, since that could lead to duplicates later.I'm actually attempting to work around an application's incredibly cumbersome internal archive process, which was never meant to deal with as much data as we have, and I want to avoid anything that could possibly break it.

rwmnau 2009-05-14 16:50:36

Answer 3

A:

Have you got an index on the datefield? If you haven't sql may be forced to upgrade to a table lock which will lock out all your users while your archive statements run.

I think you will need an index for this operation to perform at all well! Put an index on your date field and try your operation again!

Noel Kennedy 2009-05-14 16:59:15

I am using SQL 2005, and there are no indexes on the table at all, which makes the SELECT statements expensive to begin with.

rwmnau 2009-05-14 17:27:42

Answer 4

A:

Could you make a copy of Events, move all rows with dates >= x to that, drop Events and rename the copy Events? Or copy, truncate and then copy back? If you can afford a little downtime this would probably be the quickest approach.

John M Gant 2009-05-14 17:50:33

Answer 5

+2 A:

use a INSERT with an OUTPUT INTO clause to store the IDs of the inserted rows, then DELETE joining to this temp table to remove only those IDs

DECLARE @TempTable (YourKeyValue KeyDatatype not null)

INSERT INTO EventsBackups
    (columns1,column2, column3)
    OUTPUT INSERTED.primaryKeyValue
    INTO @TempTable
    SELECT
        top 1000
        columns1,column2, column3
        FROM Events

DELETE Events
    FROM Events
        INNER JOIN @TempTable  t ON Events.PrimaryKey=t.YourKeyValue

KM 2009-05-14 18:16:26

I like this solution. Note that your final join will be:ON Events.PrimaryKey = t.primaryKeyValuerather thanON Events.PrimaryKey = t.YourKeyValueJust to keep the example consistent ;-)

Aaron Alton 2009-05-14 18:44:02

@Aaron Alton, t.YourKeyValue comes from my @tempTable, which I define in my code, there is no @TempTable .primaryKeyValue. The OUTPUT INSERTED.primaryKeyValue needs changed to be INSERTED.his key value.

KM 2009-05-14 18:50:39

I really like this solution as well, except that there's no column that's a key. There can be repeat rows in the table with the same timestamp :(I really like this, though, and it's worth an upvote.

rwmnau 2009-05-14 19:19:38

if there is no primary key, add one, make it an IDENTITY.

KM 2009-05-14 21:18:29

See my post bellow. OUTPUT clause is a good hint, but use it on the DELETE to return the deleted rows straight into the INSERT.Also see my blog about this: http://rusanu.com/2008/04/09/chained-updates/

Remus Rusanu 2009-05-15 20:25:49

Answer 6

A:

Here's what I ended up doing:

SET @CleanseFilter = @startdate
WHILE @CleanseFilter IS NOT NULL
BEGIN
 BEGIN TRANSACTION

  INSERT INTO ArchiveDatabase.dbo.MyTable
  SELECT *
    FROM dbo.MyTable
   WHERE startTime BETWEEN @startdate AND @CleanseFilter

  DELETE dbo.MyTable
   WHERE startTime BETWEEN @startdate AND @CleanseFilter

 COMMIT TRANSACTION

 SET @CleanseFilter = (SELECT MAX(starttime)
    FROM (SELECT TOP 1000
                 starttime
     FROM dbo.MyTable
           WHERE startTime BETWEEN @startdate AND @enddate
        ORDER BY starttime) a)
END

I'm not pulling exactly 1000, just 1000ish, so it handles repeats in the time column appropriately (something I worried about when I considered using ROWCOUNT). Since there are often repeats in the time column, I see it regularly move 1002 or 1004 rows/iteration, so I know it's getting everything.

I'm submitting this as an answer so it can be judged up against the other solutions people have provided. Let me know if there's something obviously wrong with this method. Thanks for your help, everybody, and I'll accept whichever answer has the most votes in a few days.

rwmnau 2009-05-14 19:27:45

if you have no key, and don't want to add one, use my answer, but change it up. do the delete with the OUTPUT INTO, just capture all the columns into the temp table and then insert from a select of that temp table.

KM 2009-05-14 21:25:24

You need to be very careful with the SQL you have posted. If you don't use SERIALIZABLE transaction isolation, your DELETE will not be guaranteed to only be deleting the rows your SELECT chose. Look up non-repeatable reads, and phantom reads. If you do go with the SQL you posted, the only way for SERIALIZABLE to be guaranteed by SQL server (without an index on the date column) is to table lock, which will kill performance like it has never been killed before!

Noel Kennedy 2009-05-15 21:05:09

cant edit comments*for Serializable isolation level to be executed

Noel Kennedy 2009-05-15 23:12:48

Answer 7

A:

Another option would be to add a trigger procedure to the Events table that does nothing but add the same record to the EventsBackup table.

That way the EventsBackup is always up to date, and all you do is periodically purge records from your Events table.

Ron

Ron Savage 2009-05-14 19:38:53

Answer 8

+2 A:

Just INSERT the result of the DELETE:

WHILE 1=1
BEGIN

    WITH EventsTop1000 AS (
    SELECT TOP 1000 * 
     FROM Events
      WHERE <yourconditionofchoice>)
    DELETE EventsTop1000
     OUTPUT DELETED.* 
     INTO EventsBackup;

    IF (@@ROWCOUNT = 0)
     BREAK;
END

This is atomic and consistent.

Remus Rusanu 2009-05-14 21:18:12

ansaurus

tags:

views:

answers:

Move SQL Server data in limited (1000 row) chunks

related questions