Setup: multiple webservers, running mod_wsgi, Apache, and pgbouncer which connects to the shared DB running Postgres 8.3.6. Application is running Django.
What we're seeing: 'idle in transaction' queries on the DB that hang out for a long time. In order to see them, I'll run something like this:
SELECT query_start, procpid, client_addr, current_query FROM pg_stat_activity WHERE query_start < NOW() - interval '5 minutes';
Most results of course are just IDLE connections that pgbouncer is keeping open for use, but sometimes there will be these old 'IDLE in transaction' queries. I understand that this means that there is a query transaction which is waiting for something, or something which had a BEGIN but hasn't reached a COMMIT or ROLLBACK.
My next step was to try to use pg_locks to determine what the process is waiting on:
select pg_class.relname, pg_locks.transactionid, pg_locks.mode, pg_locks.granted as "g",
pg_stat_activity.current_query, pg_stat_activity.query_start,
age(now(),pg_stat_activity.query_start) as "age", pg_stat_activity.procpid
from pg_stat_activity,pg_locks
left outer join pg_class on
(pg_locks.relation = pg_class.oid)
where pg_locks.pid=pg_stat_activity.procpid and pg_stat_activity.procpid = <AN IDLE TRANSACTION PROCESS>
order by query_start;
A lot of times, the result I get looks like so:
relname | transactionid | mode | g | current_query | query_start | age | client_addr | procpid
---------+---------------+-----------------+---+-----------------------+------------------------------+-----------------+----------------+---------
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | ExclusiveLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
| | AccessShareLock | t | <IDLE> in transaction | 2010-07-22 15:33:11.48136-04 | 00:23:35.029045 | 192.168.100.99 | 1991
(10 rows)
I'm not sure how to read this (I guess it stems from not really understanding pg_locks). There's no relname, so is it saying that it's waiting on nothing? I thought that if granted was 'true', it had the lock. Since all these results are granted, is pg_locks showing me the locks that it has rather than what it's waiting for?
Right now I'm 'fixing' this by restarting Apache, which seems to shake the transactions loose, but obviously that's not a real solution. I'm looking for PSQL to give me a place on where to look to figure this out, especially since Django is supposed to manage its connections and transactions automatically.