ansaurus

Question

Best practice for a SQL Archiving Stored Procedure

Answer 1

+1 A:

Are there any indexes on myLargeTable.actionDate and .UniqueID?

Jonas Lincoln 2009-12-09 20:14:10

There is a clusered index on the actionDate, but nothing on the uniqueID. There are no indexes on the archive table it is being inserted into.

Kevin 2009-12-09 20:19:26

You need an index on myLargeTable.UniqueId for the JOINs. Check the execution plan in the Query Analyzer and you'll probably see table scans.

Jonas Lincoln 2009-12-09 20:27:01

Thanks Jonas, I'm going to go talk to the guy that designed the DB to see why we have no index on the uniqueID field. Seems like it would make sense...

Kevin 2009-12-09 20:37:51

Answer 2

+1 A:

Have you tried larger batch sizes than 100?

What is taking the most time? The INSERT, or the DELETE?

Joe 2009-12-09 20:15:18

When I crank the batch size up to 1000, both the Insert and Delete take about 3 and a half minutes to complete when run separately. The initial insert in to NextIDs takes only a second.

Kevin 2009-12-09 20:18:09

Answer 3

+3 A:

You effectively have three selects which need to be run before your insert/delete commands are executed:

for the 1st insert:

SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo

for the 2nd insert:

SELECT <fields> FROM myLargeTable INNER JOIN NextIDs 
on myLargeTable.UniqueID = NextIDs.UniqueID

for the delete:

(select *)
FROM MyLargeTable INNER JOIN NextIDs on myLargeTable.UniqueID = NextIDs.UniqueID

I'd try and optimize these and if they are all quick, then the indexes may be slowing down your writes. Some suggestions:

start profiler and see what's happenng with the reads/writes etc.
check index usage for all three statements.
try running the SELECTs returning only the PK, to see if the delay is query execution or fetching the data (do have e.g. any fulltext-indexed fields, TEXT fields etc.)

davek 2009-12-09 20:15:23

Answer 4

+4 A:

Do you have an index on the source table for the column which you're using to filter the results? In this case, that would be the actionDate.

Also, it can often help to remove all indexes from the destination table before doing massive inserts, but in this case you're only doing 100's at a time.

You would also probably be better off doing this in larger batches. With one hundred at a time the overhead of the queries is going to end up dominating the costs/time.

Is there any other activity on the server during this time? Is there any blocking happening?

Hopefully this gives you a starting point.

If you can provide the exact code that you're using (maybe without the column names if there are privacy issues) then maybe someone can spot other ways to optimize.

EDIT: Have you checked the query plan for your block of code? I've run into issues with table variables like this where the query optimizer couldn't figure out that the table variable would be small in size so it always tried to do a full table scan on the base table.

In my case it eventually became a moot point, so I'm not sure what the ultimate solution is. You can certainly add a condition on the actionDate to all of your select queries, which would at least minimize the effects of this.

The other option would be to use a normal table to hold the IDs.

Tom H. 2009-12-09 20:17:56

See my edit in the OP. I think this answers all your questions.

Kevin 2009-12-09 20:26:30

Answer 5

A:

You might try doing this using the output clause:

declare @items table (
  <field list just like source table> )

delete top 100 source_table
  output deleted.first_field, deleted.second_field, etc
  into @items
  where <conditions>

insert archive_table (<fields>)
  select (<fields>) from @items

You also might be able to do this in a single query, by doing 'output into' directly into the archive table (eliminating the need for the table var)

Ray 2009-12-09 20:39:08

Answer 6

+1 A:

The INSERT and DELETE statements are joining on

[ISAdminDB].[dbo].[UserUnitAudit].UniqueID

If there's no index on this, and you indicate there isn't, you're doing two table scans. That's likely the source of the slowness, b/c a SQL Server table scan reads the entire table into a scratch table, searches the scratch table for matching rows, then drops the scratch table.

I think you need to add an index on UniqueID. The performance hit for maintaining it has got to be less than table scans. And you can drop it after your archive is done.

DaveE 2009-12-09 20:40:31

This actually lead me to the solution. Instead of keeping track of the rows I needed to move by UniqueID which has no index, I simply used the WHERE [ActionDateTime] < @TwoYearsAgo clause on my insert and delete, and viola, much faster.

Kevin 2009-12-09 21:02:45

ansaurus

tags:

views:

answers:

Best practice for a SQL Archiving Stored Procedure

related questions