views:

53

answers:

2

I have about 10 tables with over 2 million records and one with 30 million. I would like to efficiently remove older data from each of these tables.

My general algorithm is:

  • create a temp table for each large table and populate it with newer data
  • truncate the original tables
  • copy tmp data back to original tables using: "insert into originaltable (select * from tmp_table)"

However, the last step of copying the data back is taking longer than I'd like. I thought about deleting the original tables and making the temp tables "permanent", but I lose constraint/foreign key info.

If I delete from the tables directly, it takes much longer. Given that I need to preserve all foreign keys and constraints, are there any faster ways of removing the older data?

Thanks.

+1  A: 

The fastest process is likely to be exactly as you've outlined:

  1. Copy new data into a temporary table
  2. Drop indexes and foreign keys
  3. Drop the old table
  4. Copy the temporary table back to the old table name
  5. Rebuild indexes and foreign keys.

The Postgres manual has some suggestions on perfomance, too, that may or may not apply. Frankly, however, it is significantly quicker to drop a table than to drop millions of rows (since each delete is performed tuple by tuple) and it is significantly quicker to insert millions of rows into a table with no constraints or indexes (as each constraint must be checked and each index must be updated for each record insert; by removing all constraints, you limit this to a single build of the index and a single verification for the constraint).

ig0774
+1  A: 

The "standard" solution for these problems typically involves partitioning your tables on the appropriate key, such that when you need to delete old data, you can simply drop a whole partition -- certainly the fastest deletion that you will ever get.

However, partitioning in PostgreSQL isn't as easy as some other databases -- you need to relocate data manually using triggers, and there are caveats (e.g. no global primary keys)

See the PostgreSQL manual on Partitioning

intgr