views:

282

answers:

2

I need an Object Pool, and rather than implement it myself, I thought I would look around for a ready-made and tested Python library.

What I found was plenty of other people looking, but not getting many straight answers, so I have brought it over here to Stack Overflow.

In my case, I have a large number of threads (using the threading module), which need to occasionally call a remote SOAP-based server. They could each establish their own connection to the server, but setting up a socket and completing the authentication process is expensive (it is throttled by the server), so I want to share a pool of connections, creating more only as needed.

If the items to pool were worker subprocesses, I might have chosen multiprocessing.pool, but they are not. If they were worker threads, I might have chosen this implementation, but they are not.

If they were MySQL connections, I might have chosen pysqlpool, but they are not. Similarly the SQLAlchemy Pool is out.

If there was one thread, using a variable number of connections/objects, I would consider this implementation, but I need it to be thread-safe.

I know I could implement this again fairly quickly, but given there are many people looking for it, I thought a canonical answer on Stack Overflow would be nice.

A: 

It's too easy to make an implementation in Python. Here's one:

pool = []

How do you use it?

from myobjectpool import pool
# Need to get a new object from the pool:
ob = pool.pop()
# When you are done:
pool.append(ob)

Obviously, for most cases this isn't enough. The object may have a state that needs to be reset, etc. But those things tend to be different for different objects. And also, resetting this state often is more work than creating objects from scratch anyway, which is one reason object pools aren't used much.

Also, the above object pool evidently doesn't work cross processes (but that didn't seem to be a problem here). If you need object pools because you are pooling some sort of external (to Python) resource, make sure that the objects have no state except that resource, and you should be fine.

Lennart Regebro
+4  A: 

It seems to me, from your description, that what you need is a pool of connections, not of objects. For simple thread-safety, just keep the reusable connections in a Queue.Queue instance, call it pool. When a thread instantiates a connection-wrapping object, the object gets its connection via pool.get() (which automaticaly enqueues it to wait if there are no connections currently availabe and dequeues it when a connection's ready for it); when the object's done using its connection, it puts it back in the pool via pool.put.

There's so little universally-required, general-purpose functionality in this, beyond what Queue.Queue already gives you, that it's not surprising no module providing it is well known or popular -- hard to make a module widespread when it has about 6 lines of functional code in all (e.g. to call a user-supplied connection factory to populate the queue either in advance or just-in-time up to some maximum number -- not a big added value generally, anyway). "Thick glue", thickly wrapping the underlying functionality from a standard library module without substantial added value, is an architectural minus, after all;-).

Alex Martelli
Ah, right, waiting, if there are nothing more in the pool, that's what the list lacks. I thought I was clever with list instead of Queue, but too clever actually. :)
Lennart Regebro
@Lennart, also no _guarantee_ of thread safety, you may or may not run into problems depending on implementations -- with Queue.Queue, your thread safety is guaranteed.
Alex Martelli
Python has a thread-safe Queue already built-in? I didn't know that! Yes, that'll speed up the implementation (which I thought would be short, but mainly spent thinking through concurrency issues). Sorry, I didn't understand your distinction about a "pool of connections" versus "pool of objects". I said I wanted "to share a pool of connections", but each of those connections is wrapped up in an object, so it is indeed a pool of objects too. The distinction I was trying to make, though, is that the connection objects were NOT active (unlike multiprocessing.pool.)
Oddthinking
@Oddthinking, yep, the `Queue` module in Python's standard library is exactly that -- a threadsafe queue (the base one is LIFO, there are priority and FIFO variants too). As for "what to pool", my point is: pool connections which are as lightly-wrapped or unwrapped as you can, because making the connection is the costly part; wrapping a currently unused connection in a brand-new wrapping object that adds all the trimming you want for one transaction's duration should be cheap and fast in comparison, so, no need to pool the wrappers!
Alex Martelli