views:

740

answers:

5

Hi,

I can see in the postgresql logs that certain simple queries (no joins and using only match conditions that use indexes) take anywhere from 1 to 3 seconds to execute. I log queries that take longer than a second to execute thus there are similar queries which execute under a second which don't get reported.

When I try the same query using EXPLAIN ANALYZE, it takes a few milliseconds.

The table contains around 8 million records and is written to and queried extensively. I have enabled auto vacuum and even recently (few hours ago) ran a VACUUM ANALYZE on that table.

Sample query log entry: Dec 30 10:14:57 db01 postgres[7400]: [20-1] LOG: duration: 3857.322 ms statement: SELECT * FROM "responses" WHERE ("responses".contest_id = 17469) AND (user_id is not Dec 30 10:14:57 db01 postgres[7400]: [20-2] null) ORDER BY updated_on desc LIMIT 5

contest_id and user_id is indexed. updated_on is not indexed. If I index it, the query planner ignores the contest_id index ans uses updated_on instead which further slows down the query. The maximum entries on the above query without the LIMIT would not be more than 1000.

Any help would be much appreciated.

+3  A: 

A few more details might be helpful here, depending on whether or not you can provide them. Most useful would be the actual output of your EXPLAIN ANALYZE, so that we can see what it does in completing the query. The definition of the table being queried might prove helpful too, along with the indexes. The more information the merrier. I can only speculate right now on what is going on, here are a few blind stabs:

  • A lot of other SELECTs are happening on this database at the same time, and periodically the data and/or result is expiring out of some cache somewhere.
  • There is something else that periodically locks this table for upwards of 3-4 seconds before releasing it again, during which time this query is stuck
  • This table is written to so extensively that the table statistics end up almost never reflecting reality, and as such the query analyzer botches the decision on whether or not to use index(es) to perform the query.

Other people might have other ideas, but yeah. More info on what is happening might prove useful.

Ed Carrel
If the problem is #1 (A lot of other SELECTs....), what can I do to improve the performance?I just ran an Explain Analyze and it took around 4 seconds. When I ran it again with different ids, it took a gew milliseconds.Hence I suspect it's related to data getting swapped out from the cache.
Nikhil Gupte
@Nikhil: It's not enough to run it, you have to also read it. If you want others to help you understand your performance issues, we'll have to read it, too.
Dustin
A: 

This seems to be happening due to swapping.

Nikhil Gupte
Buy RAM or reduce the number of processes other than DB on the machine.
Grant Johnson
A: 

pgsql-performance is a great mailing list to ask these kinds of questions.

It seems like you have two problems here:

1) You want to be able to index updated_on, but if you do, PostgreSQL chooses the wrong plan.

My first wild guess would be that PostgreSQL is overestimating the number of tuples that match the predicate "(responses.contest_id = 17469) AND (user_id is not null)". If postgres uses that predicate first, it must later sort the values to implement the ORDER BY. You say it matches 1000 tuples; if postgresql thinks it matches 100000, maybe it thinks scanning in order using the updated_on index will be cheaper. Another factor may be your configuration: if work_mem is set low, it may think that sorting is more expensive than it is.

You really need to show the EXPLAIN ANALYZE output of a slow query, so that we can see why it may be choosing an index scan on updated_on.

2) Even if it's not indexed, sometimes it takes a while to execute, but you don't know why because if you run it manually it works fine.

Use the auto_explain contrib module, new with 8.4. It allows you to log the EXPLAIN ANALYZE output of queries that take too long. Just logging the query puts you in exactly the problem you have now: every time you run the query it's fast.

Jeff Davis
A: 

if exactly the same query takes miliseconds in explain analyze, and 3 seconds in logs (i.e. i assume that it happens to take 3 seconds, not every call to it takes that long) - than it definitely means that it is locking issue.

depesz
A: 
  • locking
  • swapping
  • CLUSTER / VACUUM FULL in cron job
  • saturated network
  • saturated IO

check iostat, vmstat, iptraf...

peufeu