views:

201

answers:

6

Edit: Please answer one of the two answers I ask. I know there are other options that would be better in a different case. These other potential options (partitioning the table, running as one large delete statement w/o committing in batches, etc) are NOT options in my case due to things outside my control.

I have several very large tables to delete from. All have the same foreign key that is indexed. I need to delete certain records from all tables.

table source
  id --primary_key
  import_source --used for choosing the ids to delete

table t1
  id --foreign key
  --other fields

table t2
  id --foreign key
  --different other fields

Usually when doing a delete like this, I'll put together a loop to step through all the ids:

declare
my_counter integer := 0;
begin
for cur in (
select id from source where import_source = 'bad.txt'
) loop
  begin
    delete from source where id = cur.id;
    delete from t1 where id = cur.id;
    delete from t2 where id = cur.id;
    my_counter := my_counter + 1;
    if my_counter > 500 then
      my_counter := 0;
      commit;
    end if;
    end;
  end loop;
  commit;
end;

However, in some code I saw elsewhere, it was put together in separate loops, one for each delete.

declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'

for h in 1..my_import_ids.count
  delete from t1 where id = my_import_ids(h);
    --do commit check
end loop;
for h in 1..my_import_ids.count
  delete from t2 where id = my_import_ids(h);
    --do commit check
end loop;

--do commit check will be replaced with the same chunk to commit every 500 rows as the above query

So I need one of the following answered:

1) Which of these is better?

2) How can I find out which is better for my particular case? (IE if it depends on how many tables I have, how big they are, etc)

Edit:

I must do this in a loop due to the size of these tables. I will be deleting thousands of records from tables with hundreds of millions of records. This is happening on a system that can't afford to have the tables locked for that long.

EDIT:

NOTE: I am required to commit in batches. The amount of data is too large to do it in one batch. The rollback tables will crash our database.

If there is a way to commit in batches other than looping, I'd be willing to hear it. Otherwise, don't bother saying that I shouldn't use a loop...

+7  A: 

Why loop at all?

delete from t1 where id IN (select id from source where import_source = 'bad.txt';
delete from t2 where id IN (select id from source where import_source = 'bad.txt';
delete from source where import_source = 'bad.txt'

That's using standard SQL. I don't know Oracle specifically, but many DBMSes also feature multi-table JOIN-based DELETEs as well that would let you do the whole thing in a single statement.

Larry Lustig
+1 and a Coke for beating me for a couple of secs.
Otávio Décio
Not possible. Due to how large the tables are I *must* do this in a loop, commiting as I go.
David Oneill
I'm not an Oracle person so I can't dispute your DBA's statement, but I find it hard to believe that Oracle doesn't provide some kind of setting to commit those DELETEs as the command executes. It's nuts that you can't issue such a simple SQL command against a major SQL database.
Larry Lustig
Hmm. I'll ask our DBA about that.
David Oneill
+1 Large DELETE are not a problem, just size your rollback segments appropriately
Vincent Malgrat
Large deletes are a problem. Our DBA has forbid them. I am going to ask him about if there is a setting where it can commit each step (as Larry suggested a few comments back), but otherwise this is not possible.
David Oneill
A complicating factor is not only the rollback for the delete, but the rollbacks for all the inserts and updates that happen WHILE THE DELETE IS RUNNING. Unless I hear otherwise from our DBA, one large delete is NOT an option.
David Oneill
Oracle does indeed support deletes against an inline view as long as it is key-preserved.
David Aldridge
+1  A: 

First of all, you shouldn't commit in the loop - it is not efficient (generates lots of redo) and if some error occurrs, you can't rollback.

As mentioned in previous answers, you should issue single deletes, or, if you are deleting most of the records, then it could be more optimal to create new tables with remaining rows, drop old ones and rename the new ones to old names.

Something like this:

CREATE TABLE new_table as select * from old_table where <filter only remaining rows>;

index new_table
grant on new table
add constraints on new_table
etc on new_table

drop table old_table
rename new_table to old_table;

See also Ask Tom

Majkel
I *must* commit in a loop. This is an order from our DBA - if I don't, the rollback information will be too large and will crash our system.I am deleting a small minority of the records, so the create new and drop is not feasible.
David Oneill
Committing in loop can give you ORA-01555 errors, not mentioning poorer performance. Your DBA should rather increase rollback segment size.
Majkel
Not feasible. I expect this delete to take hours to run. We can't have tables on our production database locked for that long.
David Oneill
With constraints you mentioned, I think you will be best off with egorius's solution then. It does what you want while minimizing number of context switches. I would recommend testing for the maximum amount of operations between commits - if you have so many rows to delete, it should give measurable performance boost (less redo = less writes to disk for redo log, among other things).
Majkel
+1  A: 

Larry Lustig is right that you don't need a loop. Nonetheless there may be some benefit in doing the delete in smaller chunks. Here PL/SQL bulk binds can improve speed greatly:

declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'

forall h in 1..my_import_ids.count
  delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
  delete from t2 where id = my_import_ids(h);

The way I wrote it it does it all at once, in which case yeah the single SQL is better. But you can change your loop conditions to break it into chunks. The key points are

  • don't commit on every row. If anything, commit only every N rows.
  • When using chunks of N, don't run the delete in an ordinary loop. Use forall to run the delete as a bulk bind, which is much faster.

The reason, aside from the overhead of commits, is that each time you execute an SQL statement inside PL/SQL code it essentially does a context switch. Bulk binds avoid that.

Dan
Your first point: yes, I commit ever 500 or so rows.
David Oneill
2nd point: can you provide more details how forall works? How can your query be re-written to use forall, but still commit in batches
David Oneill
A: 

You may try partitioning anyway to use parallel execution, not just to drop one partition. The Oracle documentation may prove useful in setting this up. Each partition would use it's own rollback segment in this case.

Adam Hawkes
Partitioning the t1, t2, etc tables is not an option due to considerations outside my control.
David Oneill
Well, then I answer like this:1. Both your solutions are optimal for your circumstances.2. You can determine which is better by testing. Do both in a simulated environment and see what happens.
Adam Hawkes
+6  A: 

David, If you insist on commiting, you can use the following code:

declare
  type import_ids is table of integer index by pls_integer;
  my_import_ids import_ids;
  cursor c is select id from source where import_source = 'bad.txt';
begin
  open c;
  loop
    fetch c bulk collect into my_import_ids limit 500;
    forall h in 1..my_import_ids.count
      delete from t1 where id = my_import_ids(h);
    forall h in 1..my_import_ids.count
      delete from t2 where id = my_import_ids(h);
    commit;
    exit when c%notfound;
  end loop;
  close c;
end;

This program fetches ids by pieces of 500 rows, deleting and commiting each piece. It should be much faster then row-by-row processing, because bulk collect and forall works as a single operation (in a single round-trip to and from database), thus minimizing the number of context switches. See Bulk Binds, Forall, Bulk Collect for details.

egorius
With the commit inside the cursor I wonder whether it might be best to intorduce an ORDER BY into the cursor select in order to make sure that all values have been read from the source table prior to commiting, hopefully reducing the chances of a snapshot too old error. you'd want to check the execution plan to make sure that a sort is being performed of course.
David Aldridge
David Aldridge: Hmm, do we need to delete rows from the "source" table as well? I overlooked it. But ORDER BY what? We'd like to sort rows so that we read them block by block, never returning to previous block again, right? I think, FULL SCAN access will do that.
egorius
A: 

If you are doing the delete from the source before the t1/t2 deletes, that suggests you don't have referential integrity constraints (as otherwise you'd get errors saying child records exist).

I'd go for creating the constraint with ON DELETE CASCADE. Then a simple

DECLARE
  v_cnt NUMBER := 1;
BEGIN
  WHILE v_cnt > 0 LOOP
   DELETE FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
   v_cnt := SQL%ROWCOUNT;
   COMMIT;
  END LOOP;
END;

The child records would get deleted automatically.

If you can't have the ON DELETE CASCADE, I'd go with a GLOBAL TEMPORARY TABLE with ON COMMIT DELETE ROWS

DECLARE
  v_cnt NUMBER := 1;
BEGIN
  WHILE v_cnt > 0 LOOP
   INSERT INTO temp (id)
   SELECT id FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
   v_cnt := SQL%ROWCOUNT;
   DELETE FROM t1 WHERE id IN (SELECT id FROM temp);
   DELETE FROM t2 WHERE id IN (SELECT id FROM temp);
   DELETE FROM source WHERE id IN (SELECT id FROM temp);
   COMMIT;
  END LOOP;
END;

I'd also go for the largest chunk your DBA will allow. I'd expect each transaction to last for at least a minute. More frequent commits would be a waste.

This is happening on a system that can't afford to have the tables locked for that long.

Oracle doesn't lock tables, only rows. I'm assuming no-one will be locking the rows you are deleting (or at least not for long). So locking is not an issue.

Gary