views:

60

answers:

1

Hi.

I need to delete many rows from sql server 2008 database, it must be scalable so i was thinking about bulk delete, the problem is that the are not many references on this, at least in my case.

The first factor is that i will exactly know the ID of every row to delete, so any tips with TOP are not an option, also i will delete less rows that i want to retain so there will be no need for some "drop/temp table/re-create" methods.

So i was thinking to use WHERE IN , either suppling IDs or xml data with IDs, there is also an option to use MERGE to delete rows.

If i will have to delete 1000+ rows, sending all ids to WHERE IN could be a problem ? And what with MERGE - it is really a cure for all bulk insert/update/delete problems ? What to choose ?

+2  A: 

One option would be to store the known ID's into a "controller" table, and then delete rows from your main data table that have their ID show up in the controller table.

That way, you could easily "batch" up your deletes, e.g.

DELETE FROM dbo.YourMainDataTable
WHERE ID IN (SELECT TOP (250) ID FROM dbo.DeleteControllerTable)

You could easily run this delete statement in e.g. a SQL Agent Job which comes around every 15 minutes to check if there's anything to delete. In the meantime, you could add more ID's to your DeleteController table, thus you could "decouple" the process of entering the ID's to be deleted from the actual deletion process.

marc_s
Yeh i was thinking about it also, but 250 rows per 15 minutes could be to less quickly, could there be some performance issues with WHERE IN when i will select to delete for example TOP 1k to 3k rows ?
Programista
@Programista: deleting large numbers of rows is **always** a performance issue - there's no way around that (except for not deleting). Whether you delete 250 or 1000 or 3000 at a time is up to you - try it, measure the performance, decide what to do. There's no magic formula to find out how many you can delete - you need to find out yourself, on your system, under your workload.
marc_s
@Programista - To avoid lock escalation I think 5000 is the magic number.
Martin Smith