views:

1279

answers:

6

What is the best way to setup your pool with respect to:-

  1. When do you create connections?
  2. When do you close connections, and would you close all of them?
  3. Do you test connections are still good. When and how?
  4. How do you figure out a good number for the maximum number of connections?
  5. What sort of monitoring do you have in place to ensure users of the pool are well behaved? Can you stop one bad piece of code from taking out everything?
  6. Have you written your own pool, or used a third-party library?

I believe this is an agnostic question, but comments about "features" of particular databases/languages are welcome. For example, it might be slower or more expensive to connect on some databases than others.

To clarify, I do not intend to write a pool from scratch, this question is more about how to configure an existing library that does pooling.

+1  A: 

Why re-invent the wheel?

Someone has already probably solved the problem, and better.

If you're in the Java world, you can use Commons DBCP.

matt b
+2  A: 

Here is the rationale I used for a recent implementation.

1/ Have two sorts of connections in your connection pool:

  • ready, meaning open but not in use by a client.
  • active, meaning in use by a client.

2/ Have your connection pooling maintain a small number of ready connections, minimum of N and maximum of M. N can be adjusted depending on the peak speed at which your clients request connections. If the number of ready connections ever drops to zero, you need a bigger N. If the number is consistently high (say above 10), you need a lower N.

3/ When a client wants a connection, give them one of the ready ones (making it active), then immediately open a new one if there's now less than N ready (but don't make the client wait for this to complete, or you'll lose the advantage of pooling). This ensures there will always be at least N ready connections. If none are ready when the client wants one, they will have to wait around while you create a new one.

4/ When the client finishes with an active connection, return it to the ready state if there's less than M ready connections. Otherwise close it. This prevents you from having more than M ready connections.

5/ Periodically recycle the ready connections to prevent stale connections. If there's more than N ready connections, just close the oldest connection. Otherwise close it and re-open another.

This has the advantage of having enough ready AND youthful connections available in your connection pool without overloading the server.

paxdiablo
+3  A: 

I wrote a connection pool for the database in Java when it was just a design pattern and not a common library. Now I use the one built into Tomcat.

I used a thread to monitor several aspects of the pool and several parameters to control its behavior...

  1. minimumInPool="3"... These first three are created upon launch. The pool is never allowed to drop below three.
  2. maximumIdleTimeBeforeRemoval="60"... If a connect is idle for an hour, then drop it and create a new one. Idle time probably means there is only the minimum of three in the pool.
  3. maximumInUseTimeBeforeRemoval="30"... If a given connection has been checked out for over 30 minutes, then something is probably wrong. Recall it, and kill the connection.
  4. maximumTimeBeforeRemoval="60"... Remove it if it is over 60 minutes old.
  5. maximumUsageBeforeRemoval="1000"... Remove it if it has been checked out over 1000 times.
  6. monitorInterval="15"... Check the above parameters every 15 minutes.

This served me very well for a couple of years. The highest I ever saw the pool was 151 connections during a wild peek. Usually the pool was at about a dozen during heavy usage and idled down to the minimum three in the early morning hours.

I used Oracle's JDBC thin drivers and connected to an Oracle database.

dacracot
+1  A: 

I’m not sure what the context in which you are using your connections but I can share what seems to work for me.

I use SQL server as my back end and use a combination of caching with it to get better performance. My practice is to keep the connection open only if I actually need it and to not pool connections so that they clean up right away and I can see in SQL Activity monitor exactly what is active and what’s not. Each connection takes up memory so it’s nice to keep it to a dull roar when they aren’t needed.

Before I answer the connection open and close question let me say that caching is really important. Getting an object out of the cache is going to save you a ton of time. In some of my asp.net apps when caching is on in dev I have found that I can hardly measure the latency whereas with a DB call it might take anywhere from 15ms to 45ms to complete the call and this isn’t even considering other latency factors or load. The other method that I use is a good object structure for my data so that I only make a DB update if something changes. I’ve implemented some methods on my object o make sure that I’ve doing as little IO as possible.

That being said we all know that we need to access and write to our DB at some point so I follow two principles:

1.) Keep the doors and windows closed to save on energy. An open connection in one place means that it’s not available in another (or the memory and other resources are more limited). We have turned pooling off because it has resulted in better performance for us.

2.) I do as much in batch or at once as I can when the connection is open. This is a bit more complicated so let me explain. - one method that I’ve used is to pass my connection objects down the pipe so that all the objects can use one connection object. This results in one connection being open and closed instead of maybe 10 or more depending on your app. A good example of this is one of our purchasing models that takes advantage of the power of SQL server for gathering statistics and hashing out complicated ordering patterns. It doesn’t make sense to keep opening and closing the connection when you’re making 200K+ DB lookup or whatever the apps is for. The other part to this is that when I use object I try to bundle my updates to reduce the time that I keep the connection open. So doing a scope_identity on the insert call let’s me take care of both my insert and a lookup for the unique ID to add to my object before caching it. Back in the day when I first was developing asp apps I’d actually open the connection as soon as the page started to load and then close it after. I don’t recommend doing that anymore. Now a day there is a large benefit to these sorts of abstractions and layers that I would recommend any novice programmer take careful attention to.

My two cents:

Cache your data! Cache your data! Cache your data! Do as little DB access as possible when you can’t cache and then cache your data!

Middletone
+2  A: 

Jakarta Commons DBCP already does all the stuff you listed:

  • it creates connections as needed and manages them in a pool
  • it can close connections if they haven't been used for a certain period of time
  • it can execute a query on a connection before handing it out, and if there is an error, the connection is thrown away and a new one is created. Connections can also be tested periodically while idle.
  • you can set a limit on the connections that will be created and also on the minimum number of connections to have ready. The limit of course depends a lot on your application.
  • I don't know how but DBCP knows when a connection is not being closed and closes it for you, throwing an exception so that you know what happened when you see your log.
  • DBCP has a timeout parameter which is very useful. If all the connections in the pool are being used, it will wait for that period of time for a connection to be returned to the pool and if there aren't any available when the limit is reached, you get an error.

You can fine tune your pool by playing with the minimum number of connections, the max number of connections to be created, and the timeout. A longer timeout will allow you to have a lower limit of connections, while a shorter timeout will probably required a larger number. This depends heavily on what your application does and how it uses the connections.

Chochos
+2  A: 

I agree with matt b that we should not reinvent the wheel.

However using Commons DBCP is arguable based on the answers of this and this questions. There are better alternatives mentioned there like c3po or proxool.

Or you may use rdbms dependent connection pooling mechanism.

rics