views:

95

answers:

4

Hello,

I am trying a simple UPDATE table SET column1 = 0 on a table with ~3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. now in my last attempt.

Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.

Any other ideas?

Thanks, Ricardo

Update: This is the table structure This is the table structure:

CREATE TABLE myTable
(
  id bigserial NOT NULL,
  title text,
  description text,
  link text,
  "type" character varying(255),
  generalFreq real,
  generalWeight real,
  author_id bigint,
  status_id bigint,
  CONSTRAINT resources_pkey PRIMARY KEY (id),
  CONSTRAINT author_pkey FOREIGN KEY (author_id)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT c_unique_status_id UNIQUE (status_id)
);

I am trying to run UPDATE myTable SET generalFreq = 0;

A: 

How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.

If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.

Tom Gullen
Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.
Ricardo
I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.
Tom Gullen
+1  A: 

After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:

CREATE TABLE myTable 2 AS 
SELECT 
all the fields of my table except the one I wanted to update, 0 as myFieldToUpdate
from myTable

That took only 1,7 min. to process plus some extra time to recreate the indexes and constraints. But it did work! :)

Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.

Thanks, Ricardo

Ricardo
Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.
araqnid
Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.
Ricardo
+1  A: 

Take a look at this topic: http://stackoverflow.com/questions/3100072/postgresql-slow-on-a-large-table-with-arrays-and-lots-of-updates/3100232#3100232

First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:

SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';

HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article: http://pgsql.tapoueh.org/site/html/misc/hot.html

Ps. You need version 8.3 or better.

Frank Heikens
Thanks! This clears things up.
Ricardo
A: 

try UPDATE myTable SET generalFreq = 0.0; Maybe is a casting issue

Chocolim