ansaurus

Question

PostgreSQL - Why are some queries on large datasets so incredibly slow

Answer 1

+1 A:

Possibly try reducing your random_page_cost (default: 4) compared to seq_page_cost: this will reduce the planner's preference for seq scans by making random-accesses driven by indices more attractive.

Another thing to bear in mind is that MVCC means that updating a row is fairly expensive. In particular, updating every row in a table requires doubling the amount of storage for the table, until it can be vacuumed. So in your first query, you may want to qualify your update:

UPDATE rcra_sites Set street = regexp_replace(street,'/','','i')
                  where street ~ '/'

(afaik postgresql doesn't automatically suppress the update if it looks like you're not actually updating anything. Istr there was a standard trigger function added in 8.4 (?) to allow you to do that, but it's perhaps better to address it in the client side)

araqnid 2010-05-19 12:03:48

These settings were commented out by default. I changed random_page_cost to 2.0 and seq_page_cost to 3.0. The query planner is now deciding to use both indexes. Nice!Yes, I should qualify the records to update more often, but in many cases I really am updating all rows. The time of the query is now down to 6 minutes. But that still seems too long to me.

Brad Mathews 2010-05-19 16:07:26

setting seq_page_cost higher than random_page_cost feels wrong to me, but if it works for you....the slowness of updating most of the table is the cost of letting concurrent transactions be able to access the old versions of the rows while the update is in progress. other databases work different ways. such a big update could just lock up the entire table or exhaust undo/redo space in other systems, for example. sorry I can't think of much else to help...

araqnid 2010-05-19 16:24:05

Answer 2

A:

When a row is updated, a new row version is written.

If the new row does not fit in the same disk block, then every index entry pointing to the old row needs to be updated to point to the new row.

It is not just indexes on the updated data that need updating.

If you have a lot of indexes on rcra_sites, and only one or two frequently updated fields, then you might gain by separating the frequently updated fields into a table of their own.

You can also reduce the fillfactor percentage below its default of 100, so that some of the updates can result in new rows being written to the same block, resulting in the indexes pointing to that block not needing to be updated.

Stephen Denne 2010-05-20 02:23:20

ansaurus

tags:

views:

answers:

PostgreSQL - Why are some queries on large datasets so incredibly slow

related questions