ansaurus

Question

Why does PostgresQL query performance drop over time, but restored when rebuilding index

Answer 1

+2 A:

Auto vacuum should do the trick, provided you configured it for your desired performance.

Notes: VACUUM FULL: this will rebuild table statistics and reclaim loads of disk space. It locks the whole table.

VACUUM: this will rebuild table statistics and reclaim some disk space. It can be run in parallel with production system, but generates lots of IO which can impact performance.

ANALYZE: this will rebuild query planner statistics. This is triggered by VACUUM, but can be run on its own.

More detailed notes found here

Tim Drisdelle 2010-03-08 15:08:39

Any suggestions on why vacuuming would make such a difference ? Given the docs at the link, i can only guess that "To update data statistics used by the PostgreSQL query planner." would have a performance impact. However, would a distribution inefficiency be capable of causing this much of a performance problem ? Could such an inefficiency convince PostgresQL to skip the index and perform a table scan (ie render the index useless) ?

Jim Rush 2010-03-08 17:09:56

Performance gain from VACUUM can come mainly from disk space recovery and query plan optimization. Your questions about distribution inefficiency would be related to disk space recovery. (see section 23.1.2 in that link). And, yes, it can have a big impact with large enough data sets.

Tim Drisdelle 2010-03-09 08:07:32

Answer 2

A:

Is the '2010-05-20T13:00:00.000' value that xmlscheduledtime is being compared to, part of the SQL, or supplied as a parameter?

When planning how to run the query, saying that a field must be less than a supplied parameter with an as yet unknown value doesn't give PostgreSQL much to go on. It doesn't know whether that'll match nearly all the rows, or hardly any of the rows.

Reading about how the planner uses statistics helps tremendously when trying to figure out why your database is using the plans it is.

You might get better select performance by changing the order of fields in that complex index, or creating a new index, with the fields ordered (campaignfqname, currentstate, xmlscheduledtime) since then the index will take you straight to the campaign fq name and current state that you are interested in, and the index scan over the xmlscheduledtime range will all be rows you're after.

Stephen Denne 2010-03-08 20:06:10

It's supplied as a the parameter. The logic involves scheduling and retrying of a significant number of work tasks. If I would have been involved in the original design, that field would have been numeric instead of text (we have a strong desire to stay as database agnostic as possible and therefore wouldn't have used timestamp field). I've wondered if the type of comparison of that field had any relation to our problem, but lack the resources to create a suitable number of test cases to better understand the problem.

Jim Rush 2010-03-08 20:34:23

ansaurus

tags:

views:

answers:

Why does PostgresQL query performance drop over time, but restored when rebuilding index

related questions