ansaurus

Question

Slow simple update query on Postgresql database with 3 million rows

Answer 1

A:

How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.

If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.

Tom Gullen 2010-07-29 10:06:12

Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

Ricardo 2010-07-29 11:00:17

I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

Tom Gullen 2010-07-29 11:06:21

Answer 2

+1 A:

After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:

CREATE TABLE myTable 2 AS 
SELECT 
all the fields of my table except the one I wanted to update, 0 as myFieldToUpdate
from myTable

That took only 1,7 min. to process plus some extra time to recreate the indexes and constraints. But it did work! :)

Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.

Thanks, Ricardo

Ricardo 2010-07-29 11:18:17

Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

araqnid 2010-07-29 17:40:40

Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

Ricardo 2010-07-29 18:43:05

Answer 3

+1 A:

Take a look at this topic: http://stackoverflow.com/questions/3100072/postgresql-slow-on-a-large-table-with-arrays-and-lots-of-updates/3100232#3100232

First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:

SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';

HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article: http://pgsql.tapoueh.org/site/html/misc/hot.html

Ps. You need version 8.3 or better.

Frank Heikens 2010-07-30 08:55:54

Thanks! This clears things up.

Ricardo 2010-08-04 12:34:18

Answer 4

A:

try UPDATE myTable SET generalFreq = 0.0; Maybe is a casting issue

Chocolim 2010-08-10 21:08:25

ansaurus

tags:

views:

answers:

Slow simple update query on Postgresql database with 3 million rows

related questions