views:

178

answers:

5

Hi there,

delete activities
where unt_uid is null

would be the fastest way but nobody can access the database / table until this statement has finished so this is a no-go.

I defined a cursor to get this task done during working time but anyway the impact to productivity is to big. So how to delete these record so that the normal use of this database is guaranteed?

It's a SQL-2005 Server on a 32-bit Win2003. Second Question is: How Long would you estimate for this job to be done (6 hours or 60 hours)? (Yes, i know that depends on the load but assume that this is a small-business environment)

+7  A: 

You can do it in chunks. For example, every 10 seconds execute:

delete from activities where activityid in 
  (select top 1000 activityid from activities where unt_uid is null)

Obviously define the row count (I arbitrarily picked 1000) and interval (I picked 10 seconds) which makes the most sense for your application.

Keltex
Thanks for this hint. I would do it with a cursor who selects a top 1000 and would wrap it with a while (select count(*) ...) > 0.I think the transaction-log is annother story: it isn't possible to do deletion without logging while normal use of the database is regularly logged?
Ice
@ice I don't think so. You can turn off logs for bulk updates but not deletions.
Keltex
+5  A: 

Perhaps instead of deleting the records from your table, you could create a new identical table, insert the records you want to keep, and then rename the tables so the new one replaces the old one. This would still take some time, but the down-time on your site would be pretty minimal (just when swapping the tables)

Ray
+3  A: 

Who can access the table will depend on your transaction isolation mode, I'd guess.

However, you're broadly right - lots of deletes is bad, particularly if your where clause means it cannot use an index - this means the database probably won't be able to lock only the rows it needs to delete, so it will end up taking a big lock on the whole table.

My best recommendation would be to redesign your application so you don't need to delete these rows, or possibly any rows.

You can either do this by partitioning the table such that you can simply drop partitions instead, or use the "copy the rows you want to keep then drop the table" recipe suggested by others.

MarkR
Partition it'a a great idea: http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx
Rubens Farias
Partitioning is ONE way of avoiding having to do the delete, but not the only one.
MarkR
A: 

In a small-business environment, it seems odd that you would need to delete 500,000 rows in standard operational behavior without affecting any other users. Typically for deletes that large, we're making a new table and using TRUNCATE/INSERT or sp_rename to overwrite the old one.

Having said that, in a special case, one of my monthly processes regularly can delete 200m rows in batches of around 3m at a time if it detects that it needs to re-run the process which generated those 200m rows. But this is a single-user process in a dedicated data warehouse database, and I wouldn't call it a small-business scenario.

I second the answers recommending seeking alternative approaches to your design.

Cade Roux
A: 

I'd use the "nibbling delete" technique. From http://sqladvice.com/blogs/repeatableread/archive/2005/09/20/12795.aspx:

DECLARE @target int
SET @target = 2000
DECLARE @count int
SET @count = 2000

WHILE @count = 2000 BEGIN

 DELETE FROM myBigTable
 WHERE targetID IN
 (SELECT TOP @target targetID
  FROM myBigTable WITH(NOLOCK)
  WHERE something = somethingElse) 

 SELECT @count = @@ROWCOUNT
 WAITFOR DELAY '000:00:00.200'

END

I've used it for exactly this type of scenario. The WAITFOR is important to keep, it allows other queries to do their work in between deletes.

Jonas Lincoln
Shouldn't you be checking `@count <> 0` instead? Otherwise, you could have a few rows left if the number of rows does not divide evenly with 2000
Michael Madsen
It works, actually. Consider the case where the last DELETE is 1337 rows. The SELECT @count = @@rowcount will get you 1337, terminating the while-loop.
Jonas Lincoln
Of course, the @@rowcount will be off if there are any triggers or cascading deletes.
Jonas Lincoln
In general, this is not a bad idea. But it's kind of a black-box. You should do it in small groups first to get an idea of the time each "nibble" takes.
Keltex
Yep, you have to figure our the size of each nibble. Too large and you'll lock everyone out. Too small and you spend too much time in the WAITFOR.
Jonas Lincoln