views:

101

answers:

1

Hi, I work on manage.py command which creates about 200 threads to check remote hosts. My database setup allows me to use 120 connections, so I need to use some kind of pooling. I've tried using separated thread, like this

class Pool(Thread):
    def __init__(self):
        Thread.__init__(self)        
        self.semaphore = threading.BoundedSemaphore(10)

    def give(self, trackers):
        self.semaphore.acquire()
        data = ... some ORM (not lazy, query triggered here) ...
        self.semaphore.release()
        return data

I pass instance of this object to every check-thread but still getting "OperationalError: FATAL: sorry, too many clients already" inside Pool object after init-ing 120 threads . I've expected that only 10 database connections will be opened and threads will wait for free semaphore slot. I can check that semaphore works by commenting "release()", in that case only 10 threads will work and other will wait till app termination.

As much as I understand, every thread is opening new connection to database even if actual call is inside different thread, but why? Is there any way to perform all database queries inside only one thread?

+2  A: 

Django's ORM manages database connections in thread-local variables. So each different thread accessing the ORM will create its own connection. You can see that in the first few lines of django/db/backends/__init__.py.

If you want to limit the number of database connections made, you must limit the number of different threads that actually access the ORM. A solution could be to implement a service that delegates ORM requests to a pool of dedicated ORM threads. To transmit the requests and their results from and to other threads you will have to implement some sort of message passing mechanism. Since this is a typical producer/consumer problem, the Python docs about threading should give some hints how to achieve this.

Edit: I've just googled for "django connection pooling". There are many people who complain that Django does not provide a proper connection pool. Some of them managed to integrate a separate pooling package. For PostgreSQL, I would take a look at the pgpool middleware.

Bernd Petersohn
I was able to solve this issue in two ways, by using pgpool2 or semaphore with manual connection.close() call in both cases. Pool of ORM threads is what I've tried first, but for some reason Django creates a new connection for every thread even if there are no actual ORM, as soon as you try to access any different thread with ORM - here comes new connection.P.S. pgpool looked like a nice solution, but I was unable to solve this problem using only it, because I have one longrunning app(not many small ones) and pgpool was unable to provide new pooled connections as they were all used by my app
Riz
The question is: in which thread context do you call Pool.give()? If you invoke a method of an object that inherits from Thread that method is still executed by the calling thread, not necessarily by the thread that is associated with the object. If I interpret your earlier comment above correctly, the calling threads are your checker threads. This means that all 200 checker threads attempt to create a private DB connection (since the connection is stored thread-locally). Your semaphores only limit the number of parallel ORM accesses, not the number of connections made.
Bernd Petersohn