views:

2812

answers:

4

Hi folks,

I'm using django with apache and mod_wsgi and PostgreSQL (all on same host), and I need to handle a lot of simple dynamic page requests (hundreds per second). I faced with problem that the bottleneck is that a django don't have persistent database connection and reconnects on each requests (that takes near 5ms). While doing a benchmark I got that with persistent connection I can handle near 500 r/s while without I get only 50 r/s.

Anyone have any advice? How to modify django to use persistent connection? Or speed up connection from python to DB

Thanks in advance.

+3  A: 

Disclaimer: I have not tried this.

I believe you need to implement a custom database back end. There are a few examples on the web that shows how to implement a database back end with connection pooling.

Using a connection pool would probably be a good solution for you case, as the network connections are kept open when connections are returned to the pool.

  • This post accomplishes this by patching Django (one of the comments points out that it is better to implement a custom back end outside of the core django code)
  • This post is an implementation of a custom db back end

Both posts use MySQL - perhaps you are able to use similar techniques with Postgresql.

Edit:

  • The Django Book mentions Postgresql connection pooling, using pgpool (tutorial).
  • Someone posted a patch for the psycopg2 backend that implements connection pooling. I suggest creating a copy of the existing back end in your own project and patching that one.
codeape
I've tried pgpool, but this not improved situation a lot (I still need reconnect each time). PgPool is designed for a bit other purposes (failover, replication etc).
HardQuestions
+5  A: 

In Django trunk, edit django/db/__init__.py and comment out the line:

signals.request_finished.connect(close_connection)

This signal handler causes it to disconnect from the database after every request. I don't know what all of the side-effects of doing this will be, but it doesn't make any sense to start a new connection after every request; it destroys performance, as you've noticed.

I'm using this now, but I havn't done a full set of tests to see if anything breaks.

I don't know why everyone thinks this needs a new backend or a special connection pooler or other complex solutions. This seems very simple, though I don't doubt there are some obscure gotchas that made them do this in the first place--which should be dealt with more sensibly; 5ms overhead for every request is quite a lot for a high-performance service, as you've noticed. (It takes me 150ms--I havn't figured out why yet.)

Edit: another necessary change is in django/middleware/transaction.py; remove the two transaction.is_dirty() tests and always call commit() or rollback(). Otherwise, it won't commit a transaction if it only read from the database, which will leave locks open that should be closed.

Glenn Maynard
See http://groups.google.com/group/django-users/browse_thread/thread/9b58de1380b1afd0 django-postgres-persistent-db-connection.diff for a more general implementation of this, but it's only implemented for Postgresql. (Not that the OP is even listening, but for anyone else that finds their way here...)
Glenn Maynard
A: 

I made some small custom psycopg2 backend that implements persistent connection using global variable. With this I was able to improve the amout of requests per second from 350 to 1600 (on very simple page with few selects) Just save it in the file called base.py in any directory (e.g. postgresql_psycopg2_persistent) and set in settings

DATABASE_ENGINE to projectname.postgresql_psycopg2_persistent

NOTE!!! the code is not threadsafe - you can't use it with python threads because of unexpectable results, in case of mod_wsgi please use prefork daemon mode with threads=1


# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using global variable

from django.db.backends.postgresql_psycopg2.base import DatabaseError, DatabaseWrapper as BaseDatabaseWrapper, \
    IntegrityError
from psycopg2 import OperationalError

connection = None

class DatabaseWrapper(BaseDatabaseWrapper):
    def _cursor(self, *args, **kwargs):
        global connection
        if connection is not None and self.connection is None:
            try: # Check if connection is alive
                connection.cursor().execute('SELECT 1')
            except OperationalError: # The connection is not working, need reconnect
                connection = None
            else:
                self.connection = connection
        cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
        if connection is None and self.connection is not None:
            connection = self.connection
        return cursor

    def close(self):
        if self.connection is not None:
            self.connection.commit()
            self.connection = None

Or here is a thread safe one, but python threads don't use multiple cores, so you won't get such performance boost as with previous one. You can use this one with multi process one too.

# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using thread local storage
from threading import local

from django.db.backends.postgresql_psycopg2.base import DatabaseError, \
    DatabaseWrapper as BaseDatabaseWrapper, IntegrityError
from psycopg2 import OperationalError

threadlocal = local()

class DatabaseWrapper(BaseDatabaseWrapper):
    def _cursor(self, *args, **kwargs):
        if hasattr(threadlocal, 'connection') and threadlocal.connection is \
            not None and self.connection is None:
            try: # Check if connection is alive
                threadlocal.connection.cursor().execute('SELECT 1')
            except OperationalError: # The connection is not working, need reconnect
                threadlocal.connection = None
            else:
                self.connection = threadlocal.connection
        cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
        if (not hasattr(threadlocal, 'connection') or threadlocal.connection \
             is None) and self.connection is not None:
            threadlocal.connection = self.connection
        return cursor

    def close(self):
        if self.connection is not None:
            self.connection.commit()
            self.connection = None
HardQuestions
Please don't do this. This is completely unthreadsafe. Use proper connection pooling like pgpool.
Alex Gaynor
Pgpool won't help because django need to reconnect every time anyway. I know that it's not threadsafe code (and I have a threadsafe version that uses psycopg2.pool module, just not published yet), but I use python with mod_wsgi and in Daemon mode witout threads but with pure prefork, so it's safe here. I'll add the note - thanks.
HardQuestions
+1  A: 

Try PgBouncer

https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer

dtamborelli
Same as pgpool it won't eliminate each request connect overhead which is huge. The connection opening code is a real bottleneck.
HardQuestions
@Mike TK First, it is not the same as pgpool. Pgbouncer works with libevent and manages connections asynchroniously as opposed to pgpool forking a process for each connection as Postgres itself does (the only difference is pgpool keeps the processes alive). From my experience using pgbouncer (compared to not using any method at all) gives a noticeable speedup.
Vasil
@Vasil Thanks, I'll try this up.
HardQuestions