ansaurus

Question

Answer 1

+1 A:

Did you read the PostgreSQL documentation for "Using EXPLAIN", to interpret the output you're showing?

I'm not a regular PostgreSQL user, but I just read that doc, and then compared to the EXPLAIN output you're showing. Your UPDATE query seems to be using no indexes, and it's forced to do table-scans to sort both page and pagelinks. The sort is no doubt large enough to need temporary disk files, which I think are created under your temp_tablespace.

Then I see the estimated database pages read. The top-level of that EXPLAIN output says (cost=127710692.21..135714045.43). The units here are in disk I/O accesses. So it's going to access the disk over 135 million times to do this UPDATE.

Note that even 10,000rpm disks with 5ms seek time can achieve at best 200 I/O operations per second under optimal conditions. This would mean that your UPDATE would take 188 hours (7.8 days) of disk I/O, even if you could sustain saturated disk I/O for that period (i.e. continuous reads/writes with no breaks). This is impossible, and I'd expect the actual throughput to be off by at least an order of magnitude, especially since you have no doubt been using this server for all sorts of other work in the meantime. So I'd guess you're only a fraction of the way through your UPDATE.

If it were me, I would have killed this query on the first day, and found another way of performing the UPDATE that made better use of indexes and didn't require on-disk sorting. You probably can't do it in a single SQL statement.

As for your DROP INDEX, I would guess it's simply blocking, waiting for exclusive access to the table, and while it's in this state I think you can probably kill it.

Bill Karwin 2009-01-09 18:55:32

Postgres' cost estimate is based on the number of memory pages it needs to touch. I think each page is 1/2 KB.

Barry Brown 2009-01-09 19:16:59

The doc I read says: "Traditional practice is to measure the costs in units of disk page fetches"

Bill Karwin 2009-01-09 19:18:05

I'm not saying you're wrong, only that my answer was based on what the doc said. :-)

Bill Karwin 2009-01-09 19:19:28

I think we're both right, because on typical Unix systems, disk blocks and memory pages are the same size. But the OS will be able to fetch multiple blocks at once.

Barry Brown 2009-01-09 19:44:27

I did actually kill it at the end of the second day! I have been waiting for the VACUUM FULL on this table to finish for the last 2 days now! Thanks for the knowledge concerning the EXPLAIN; I knew it was a measure of time, but now I know that it is in disk IO.

Nicholas Leonard 2009-01-09 22:12:42

I still wonder, does this estimate consider allocated RAM?

Nicholas Leonard 2009-01-09 22:13:13

No. It's just an estimate of the number of pages that it needs to look at, whether or not they're in memory.

Barry Brown 2009-01-10 01:11:36

Thx again. This is good info.

Nicholas Leonard 2009-01-10 16:27:27

Answer 2

A:

You need indexes or, as Bill pointed out, it will need to do sequential scans on all the tables.

CREATE INDEX page_ns_title_idx on page(page_namespace, page_title);
CREATE INDEX pl_ns_title_idx on pagelink(pl_namespace, pl_title);
CREATE INDEX page_redir_idx on page(page_is_redirect);

Barry Brown 2009-01-09 19:13:19

I already had those indexes. I went into the postgresql.conf file and played with query optimizer and index related variables in order to strongly encourage the QO to choose wisely! When my VACUUM of the table finish's, I will try it out without the redirect clause. Thx

Nicholas Leonard 2009-01-09 22:17:08

ansaurus

tags:

views:

answers:

Long UPDATE in postgresql

related questions