tags:

views:

93

answers:

2

Hi All,

I'd like to move some data from one table to another (with a possibly different schema). Straightforward solution that comes into mind is -

start a transaction with serializable isolation level;
INSERT INTO dest_table SELECT data FROM orig_table,other-tables WHERE <condition>;
DELETE FROM orig_table USING other-tables WHERE <condition>;
COMMIT;

Now what if the amount of data is rather big, and the <condition> is expensive to compute? In PostgreSQL, a RULE or a stored procedure can be used to delete data on the fly, evaluating condition only once. Which solution is better? Are there other options?

A: 

You might dump the table data to a file, then insert it to another table using COPY Usually COPY is faster than INSERT.

pcent
I've made some tests processing large amounts of data using triggers, row by row, and using a stored procedure with a single transaction.The stored procedure approach was faster.
pcent
You should also fine tune your PostgreSQL server to enhance the performance. Read:http://wiki.postgresql.org/wiki/Performance_Optimization
pcent
yah, I think that guideline should be qualified to say that one COPY is faster than a set of INSERT statements, one per row. INSERT...SELECT for copying data around I would think was optimal since the data isn't being passed outside the executor.
araqnid
Copy is going to be faster than insert FOR EXTERNAL DATA. OP is working with data already in the database so an insert is going to be faster than exporting then copying back.
Scott Bailey
+5  A: 

If the condition is so complicated that you don't want to execute it twice (which BTW sounds unlikely to me, but anyway), one possibility would be to ALTER TABLE ... ADD COLUMN on the original table to add a boolean field, and run an UPDATE on the table to set that field to true WHERE <condition>. Then your INSERT and DELETE commands can simply check this column for their WHERE clauses.

Don't forget to delete the column from both source and destination tables afterwards!

Hmm, even less intrusive would be to create a new temporary table whose only purpose is to contain the PKs of records that you want to include. First INSERT to this table to "define" the set of rows to operate on, and then join with this table for the table-copying INSERT and DELETE. These joins will be fast since table PKs are indexed.


[EDIT] Scott Bailey's suggestion in the comments is obviously the right way to do this, wish I'd thought of it myself! Assuming all the original table's PK fields will be present in the destination table, there's no need for a temporary table -- just use the complex WHERE conditions to insert into the destination, then DELETE from the original table by joining to this table. I feel stupid for suggesting a separate table now! :)

j_random_hacker
The temp table gets my vote. Updating rows and then deleting them means creating a lot of garbage in the heap, as well as requiring touching the table schema (not that that really matters)
araqnid
+1 for the temp table for PKs.
rfusca
You won't need the temp table or to do an expensive calc twice. Do the calculation once as you insert into new table. Then do a delete from old table where record is in new table.
Scott Bailey
Destination table will have plenty of data as well, so this DELETE statement is potentially big. Your idea is good, but I'm still looking for something faster.
IggShaman
@IggShaman: Although I wouldn't rule it out, I can't see how anything could be much faster, short of writing a C extension that somehow rewires the existing rows into the new table at the disk level (which is probably impossible anyway). BTW if your destination table has an index on it that includes all the PK fields of the source table, PostgreSQL will just read the index instead of the entire table.
j_random_hacker