views:

82

answers:

5

Hi, I have 50GB mysql data base (80 tables) that i need to delete some contents from it. I have a reference table that contains list if product ids that needs to be deleted from the the other tables.

now, the other tables can be 2 GB each, contains the items that needs to be deleted.

  1. my question is: since it is not a small database, what is the safest way to delete the data in one shot in order to avoid problems.

  2. What is the best method to verify the the entire data was deleted.

Some code example will probably be helpful.

Thanks

+1  A: 

Probably this doesn't help anymore. But you should keep this in mind when creating the database. In mysql (depending on the table storage type, for instance in InnoDB) you can specify relations (They are called foreign key constraints). These relations mean that if you delete an entry from one row (for instance products) you can automatically update or delete entries in other tables that have that row as foreign key (such as product_storage). These relations guard that you have a 100% consistent state. However these relations might be hard to add on hindsight. If you plan to do this more often, it is definitely worth researching if you can add these to your database, they will save you a lot of work (all kinds of queries become simpler)

Without these relations you can't be 100% sure. So you'd have to go over all the tables, not which columns you want to check on and write a bunch of sql queries to make sure there are no entries left.

Thirler
A: 

I am agree with Thirler that using of foreign keys is preferrable. It guarantees referential integrity and consisitency of the whole database.
I can believe that life sometimes requires more tricky logic. So you could use manual queries like

delete from a where id in (select id from keys)  

You could delete all records at once or by range of keys or using LIMIT in DELETE. Proper index is a must. To verify consistency you need function or query. For example:

create function check_consistency() returns boolean
begin
   return not exists(select * from child where id not in (select id from parent) ) 
       and not exists(select * from child2 where id not in (select id from parent) );
   -- and so on  
end
A: 

Also maybe something to look into is Partitioning in MySQL tables. For more information check out the ref manual:

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Comes down that you can divide tables (for example) in different partitions per datetime values or indexsets.

Pokepoke
+1  A: 

As Thirler has pointed out, it would be nice if you had foreign keys. Without them burnall 's solution can be used to transactions to ensure that no inconsistencies creep.

Regardless of how you do it, this could take a long time, even hours so please be prepared for that.

e4c5
A: 

Hi!

As pointed out earlier foreign keys would be nice in this place. But regarding question 1 you could perhaps run the changes within a transaction from the MySQL prompt. This assumes you are using a transaction safe storage engine like InnoDB. You can convert from myisam to InnoDB if you need to. Anyway something like this:

START TRANSACTION;

...Perform changes...
...Control changes...

COMMIT;
...or...
ROLLBACK;

Is it acceptable to have any downtime?

When working with PostgreSQL with databases >250Gb we use this technique on production servers in order to perform database changes. If the outcome isn't as expected we just rollback the transaction. Of course there is a penalty as the I/O-system has to work a bit.

// John

John P