views:

129

answers:

2

I'm trying to figure out the best way to minimize resource utilization when I have PHP talking to various backend services (e.g. Amazon S3 or any other random web services -- I'd like a general solution). Ideally, I'd like to have a single persistent connection to the backend (or maybe a small pool of persistent connections) with some caching, and then have all of the PHP tasks share it. We can consider it all read-only for the purposes of this question. It's not obvious to me how to do this in PHP. There's the database-specific stuff like mysql_pconnect(), but that doesn't really do it for me.

One idea I've had, which seems seems somewhat suboptimal (but is still better than having every single request create and destroy a new connection) is to use a local caching proxy (in a separate process) that would effectively do the pooling and caching. PHP would still be opening and closing a connection for every request, but at least it would be to a local process, so it should be a little faster (and it would reduce load on the backends). But it doesn't seem like this kind of craziness should be necessary. There's gotta be a better way. This is easy in other languages. Please tell me what I'm missing!

+1  A: 

There's a large ideological disconnect between the various web technologies. Some are essentially daemons that run full-time in the background, and handle requests passed in on their own. Because there's a process always running, you can have a pool of already open existing working connections.

PHP (and normal CGI scripts) does not have a daemon behind the scenes. Every time a request comes in, the PHP interpreter is started up with a clean slate, compiles the scripts, and runs the bytecode. There's no persistence. The PHP database functions that support persistent connections establish the connection at the web server child level (i.e. mod_php attached to an Apache process). This isn't exactly a connection pool, as you can only ever see the persistent connection attached to your own process.

Without having a daemon or similar process sitting behind the scenes to hand out resources, you won't get real connection pooling.

Keep in mind that most new connections to most services are not heavy-weight, and non-database connections that are heavy-weight might not be friendly to the concept of a connection pool.

Before you think about writing your own PHP-based daemon to handle stuff like this, keep in mind that it may already be a solved problem. Python came up with something called WSGI, with a similar implementation in Ruby called Rack. Perl also has something remarkably similar but I can't remember the name of it off the top of my head. A quick look at Google didn't show any PHP implementations of WSGI, but that doesn't mean they don't exist...

Charles
A: 

Because S3 and other webservices use HTTP as their transport, you won't get a significant benefit from caching the connection.

  • Although you may be using an API that appears to authenticate as a first step, looking at the S3 Documentation, the authentication happens with every request - so no benefit in authenticating once and reusing a connection
  • Web service requests over HTTP are lightweight and typically stateless. Once your request has been answered, no resources (connection or sesson state) are consumed on the server. This allows the web service implementer to use many machines to answer your request without tying up resources on a particular server
Robert Christie