tags:

views:

139

answers:

3

A dumbveloper at my work (years ago) moved the body column from our comments table to a secondary comment_extensions table as some sort of sketchy guesswork optimization. It seems ill-advised to do a join every time we want to display a comment, so I'm going to try moving that column back into our comments table and run some benchmarks.

My problem is that this update crawls. I let it run for an hour before shutting it off, fearing that it would take all night.

UPDATE comments SET body = comment_extensions.body 
                FROM comment_extensions 
                WHERE comments.id = comment_extensions.comment_id;

It's a PostgreSQL 8.1 database, and comment_extensions.comment_id is indexed.

Any suggestions for making this run faster?

+2  A: 

Well, for an academic question, why is this ill-advised? What percentage of a lookup involves needing know the comment info?

My suggestion: update in small batches (10,000 rows at a time?). It may still take all night. Depending on the nature of your system, you may also have to implement cut-over logic that prevents the system from updating or pulling from your extensions table during this migration.

Large databases hurt like that ;)

Autocracy
Every time we display a comment we need the corresponding record in the `comment_extensions` table for the comment text. So that's a join 100% of the time. It seems like there's no point in there being two tables where they should be one.Thanks for the suggestion on doing small batches. We could keep the site limping along during the migration if we did that.
Yeah, that sounds like a negative "optimization." Check for a comment field in the primary table. If the primary table's comment is null, check for a comment in the extensions table. Insert all new comments into the primary table. That will keep your site going until you're ready to dump the extensions table. The small batch updates (depending on the load of your site, 10k may be too much) will let the system do its production work while the migration happens in the background.
Autocracy
+1  A: 

You might get some benefit from disabling log while doing this. If it is a test in a non-production table, you probably don't need the protection a logfile gives you.

If there is an index or key on comments.body then drop it before the update and recreate it afterward.

Is the comments.body field a fixed width char(N) or is it a varchar? Varchar used to be slower than char(), and I suspect it still is. So use a char not varchar.

If you do a select that merges the data to a data file (say, quoted csv) and write a script to turn that into INSERTS, then empty the comments table and load it with INSERTS that might be faster than the query you have, though the index on comments.id is helping the speed.

3e6 records are going to take some time regardless.

Eric M
+2  A: 

How about this?

http://www.postgresql.org/docs/8.1/interactive/sql-createtableas.html

CREATE TABLE joined_comments
    AS SELECT c.id, c.author, c.blablabla, ce.body
    FROM comments c LEFT JOIN comment_extensions ce
    ON c.id = ce.comment_id;

That would create a new joined_comments table. That could be almost enough (you'd need to still recreate indexes and so on) but I remember Postgres 8.1 has a bug about the way serial columns get created (sorry can't find a link).

So my suggestion would be that after you have this new joined table, then you COPY TO a BINARY file from that joined_comments table, create a new comments table stating that id is a SERIAL right from the start, then COPY FROM that BINARY file to the new comments table. Then, recreate indexes.

oboxodo