views:

348

answers:

5

so basically here's what i want to do: i have an account table, i have a list of acct_id: (3, 24, 515, 6326, 17), assuming i have about 100,000 accounts in the table, what's the most effective way to delete all the other rows besides the one with the account_id in my given list?

i came up with something like:

delete from account where acct_id is not in (3, 24, 515, 6326, 17);

i heard this query is slow because it's recursive or something. consider the number of rows i have, that would be VERY SLOW. what's a better way to do this?

+3  A: 
delete from table
 where not acct_id in (3, 24, 515, 6326, etc.);

Depending on the database flavor, indexes, distributed or not, etc., this could be a lot of work. The alternative which works efficiently even in fully-journaled databases is:

create table2 temp as /* create new table from the rows to keep */
   select *
   from table
   where acct_id in (3, 24, 515, 6326, etc.);
drop table;           /* discard table */
create table as       /* copy new table to rename */
  select * from table2;
drop table2;          /* get rid of temporary table */
wallyk
that seems almost the same as the solution i came up with...care to explain a little more how this is effective on a large volumes of data?
fei
I didn't see the SQL statement when I wrote the answer. Did you add that later?
wallyk
i didn't put it in a code tag earlier. but the alternative you suggested seems promising. thx.
fei
As he's using MySQL the last step can be `RENAME TABLE table2 TO table;`
Greg K
The last 2 steps even.
Greg K
Beware of the side effects of dropping tables: index will need to be recreated and other constraints. I prefer to use TRUNCATE TABLE. Posted a solution below.
Yada
A: 

Your query seems fine to me, but take a look at Explain if you are trying to optimize your queries.

Taylor Leese
A: 

If you have got an index on acct_id I cannot see any reason why your query should be slow. As far as I know

in (3, 24, 515, 6326, 17)

is just syntactic sugar for

acct_id != 3 AND acct_id != 24 ...

which should be fast enough.

middus
A: 

Not specific to MySQL, but deletes in general are relatively expensive because they require the engine to do a bunch of selects to make sure it is deleting the right records as well as the actual deletes. You also wind up with a lot of transactions added to the transaction logs (depending on engine and settings of course).

If you only have a small set of records you want to keep and a large set you want to throw out, then you can get much fast performance by cheating...

You copy the records you want to keep and either drop or truncate the table, then add the "keepers" back in.

jeffa00
A: 

My solution is to avoid DELETE and use TRUNCATE table because when you delete the database does two operations. delete and write records into rollback segments.

Of course, this means there is no rollback when you are truncating.

-- copy the few records into a temp table
select into temp 
 select * from account
 where acct_id in (3, 24, 515, 6326, 17);

-- truncate is super fast
truncate table account;

-- put back the few records
insert into account select * from temp;

drop table temp;
Yada