views:

2156

answers:

8

We have a query to remove some rows from the table based on an id field (primary key). It is a pretty straightforward query:

delete all from OUR_TABLE where ID in (123, 345, ...)

The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. Is there any way to optimize this? (We are using sybase - if that matters).

+3  A: 

I'm wondering if parsing an IN clause with 70K items in it is a problem. Have you tried a temp table with a join instead?

JohnOpincar
Don't know Sybase but in SQLServer this too would be my first try in optimizing the delete.
Lieven
A: 

has our_table some reference on delete cascade?

Peter Miehle
+3  A: 

There are two ways to make statements like this one perform:

  1. Create a new table and copy all but the rows to delete. Swap the tables afterwards (alter table name ...) I suggest to give it a try even when it sounds stupid. Some databases are much faster at copying than at deleting.

  2. Partition your tables. Create N tables and use a view to join them into one. Sort the rows into different tables grouped by the delete criterion. The idea is to drop a whole table instead of deleting individual rows.

Aaron Digulla
+1 this was going to be my suggestion.
Elijah
A: 

To test if the IN clause is part of the problem try deleting one entry using the IN clause then the same entry without it. (obviously based on you putting that entry back in after the first delete)

e.g.

delete all from OUR_TABLE where ID in (123)

then...

delete all from OUR_TABLE where ID = 123
kevchadders
+2  A: 

Can Sybase handle 70K arguments in IN clause? All databases I worked with have some limit on number of arguments for IN clause. For example, Oracle have limit around 1000.

Can you create subselect instead of IN clause? That will shorten sql. Maybe that could help for such a big number of values in IN clause. Something like this:

  DELETE FROM OUR_TABLE WHERE ID IN 
        (SELECT ID FROM somewhere WHERE some_condition)

Deleting large number of records can be sped up with some interventions in database, if database model permits. Here are some strategies:

  1. you can speed things up by dropping indexes, deleting records and recreating indexes again. This will eliminate rebalancing index trees while deleting records.
    • drop all indexes on table
    • delete records
    • recreate indexes
  2. if you have lots of relations to this table, try disabling constraints if you are absolutely sure that delete command will not break any integrity constraint. Delete will go much faster because database won't be checking integrity. Enable constraints after delete.
    • disable integrity constraints, disable check constraints
    • delete records
    • enable constraints
  3. disable triggers on table, if you have any and if your business rules allow that. Delete records, then enable triggers.

  4. last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any.

I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks.

zendar
Disabling triggers is a very bad idea unless you are able to prevent other users from doing things on the database while they are disabled.
HLGEM
I know, that's why I wrote "if your business rules allow that". For example, I saw lots of databases where triggers were used for some sort of auditing changes. If he needs performance, than this kind of triggers probably can be eliminated.
zendar
+2  A: 

Find out what is using up the performance!

In many cases you might use one of the solutions provided. But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase):

  • Do you have foreign keys on that table? Makes sure the referring ids are indexed
  • Do you have indexes on that table? It might be that droping before delete and recreating after the delete might be faster.
  • check the execution plan. Is it using an index where a full table scan might be faster? Or the other way round? HINTS might help
  • instead of a select into new_table as suggested above a create table as select might be even faster.

But remember: Find out what is using up the performance first.

When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups.

Jens Schauder
+1  A: 

Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache.

Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table.

You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first.

Ian Ringrose
+1  A: 

Consider running this in batches. A loop running 1000 recordsa at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch.

If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. YOu'll have t oexperiement to see which is the best number for your situation. I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records).

But in any even I would put my key values that I intend to delte into a temp table and delte from there

HLGEM
+1 for putting the IDs in a temp table (or a permanent work table might work too)
Tom H.