tags:

views:

147

answers:

4

I need to delete a huge chunk of my data in my production database, which runs about 100GB in size. If possible, i would like to minimize my downtime.

My selection criteria for deleting is likely to be

DELETE * FROM POSTING WHERE USER.ID=5 AND UPDATED_AT<100

What is the best way to delete it?

  • Build an index?
  • Write a sequential script that deletes via paginating through the rows 1000 at a time?
+1  A: 

If at all possible use row level binary logging rather than statement level binary logging (it reduces the number of locks) at least during this operation. Perform your deletes in batches (1000 is a decent size). Use the primary key as a criteria to delete each batch and order by the primary key (so that you delete rows that are physically close to each other).

Eric J.
+2  A: 

You can try to use method mentioned in mysql doc:

  1. Select the rows not to be deleted into an empty table that has the same structure as the original table:

    INSERT INTO t_copy SELECT * FROM t WHERE ... ;

  2. Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:

    RENAME TABLE t TO t_old, t_copy TO t;

  3. Drop the original table:

    DROP TABLE t_old;

Piotr Pankowski
A: 

The best way is to delete incrementally by using LIMIT clause (by 10000 items), but do not apply ordering. This will allow MySQL to flush the results more often and the transtactions won't be huge. You can easily do it with any programming language you have installed which has a connector to mysql. Be sure to commit after each statement.

An index will definitely help but building it will take a while on a 100 GB table as well (anyway it is worth creating, when you are going to reuse the index in future). By the way, your current query is incorrect because reference a table USER not listed here. You should be careful with the index, so that the optimizer might benefit from using it.

newtover
A: 

A while back I wanted to delete over 99% of data from a table. The table I was deleting was a sessions table that had over 250 million rows and I only wanted the most recent 500K. The fastest way I came up with was to select the 500,000 rows that I wanted into another table. Delete the old table and rename the new table to replace the deleted one. This was about 100 times faster than doing a regular delete that has to choose records and rebuild the table.

This also has an added benefit of reducing the table file size if you're using InnoDB with innodb_file_per_table = 1 because InnoDB tables never shrink.

Mark Maunder