views:

632

answers:

6

Need information on creating a connection pool to the database (irrespective of the database) , and how efficient they are? What are the conditions where they can enhance performance.

How to create it explicitly?

+1  A: 

The intro page to Apache DBCP sums it up nicely:

Creating a new connection for each user can be time consuming (often requiring multiple seconds of clock time), in order to perform a database transaction that might take milliseconds. Opening a connection per user can be unfeasible in a publicly-hosted Internet application where the number of simultaneous users can be very large. Accordingly, developers often wish to share a "pool" of open connections between all of the application's current users. The number of users actually performing a request at any given time is usually a very small percentage of the total number of active users, and during request processing is the only time that a database connection is required. The application itself logs into the DBMS, and handles any user account issues internally.

How efficient are they ? Depends on the implementation. Typically I would expect a pool to instantiate connections either at start-up or on request. The first connection will require a real connection to the database, and thereafter when you request a connection, you're given an existing pooled connection. So the first connection request will take the most time, and afterwards you're just pulling objects from a collection (very fast).

Brian Agnew
Since Sachin specifically asked where pools can address performance, I would add that they generally only make sense in a client-server application, where you have fewer connections than users. In particular, they would not make sense in a thick-client app where you have one user connecting to a database.
John Stauffer
+1  A: 

Creating connections to databases are very expensive operations. Connection pools are instances of database connections that are created and cached. Anytime a new connection to a database is desired, one from the pool is used instead of creating a new connection. Some platforms like .NET + SQL Server use connection pools by default (you don't need to create your own). So, they basically enhance performance by saving time in creating new connections each time.

Ralph Stevens
+1  A: 

Using a connection pool, you save time at every access because connection is already established.

Moreover, at least on Oracle, you keep the compiled statement linked to the connection, so repetitive execution of same SQL statement is even quicker.

(see PreparedStatement if you are in Java/JDBC)

The only risk of counter-performance is when you keep too many idle connections in your pool, the associated ressources (your side and on database) are wasted.

Fouteier
Keep in mind that connections also lock up resources (threads, buffers) in the server. And an established connection is always authenticated via a specific user/password pair. So if connection pooling only works, when all connections use the same database account
Carsten Kuckuk
Yes you're totally right with the unique database account concern. (This can be a problem when migrating from C/S application to web for example, if the authorization is in DB, based on the connected user.)
Fouteier
A: 

Creating a database connection may or may not be an expensive operation, depending on your environment and what you intend to do with it.

If you're going to run a single very easy query, then connecting probably takes as long (or longer) than the query.

Some databases have a much bigger connection overhead than others; if tuned correctly, mysql should have very little (above the time to make a tcp connection and do the protocol handshake). However, if latency to your server is very high, even this can be quite significant (particularly if you intend to do only a few queries).

If you're planning to do, say, 100 queries, or a few really slow queries, then the connection time disappears into insignificance.

In generally I'd say open a new connection each time until you can demonstrate that it's a real performance problem. Using connection pooling can lead to BUGS, which we don't like:

  • Connection state wasn't COMPLETELY reset after the previous use in the pool - so some state lingers and creates unexpected behaviour resulting in a bug
  • Connection was closed in some way (perhaps by a stateful firewall timeout) which cannot be detected, therefore an app tries to use a closed connection, causing a long delay or failure
MarkR
For a web environment you really don't want to be getting a connection (from a pool or otherwise) at the beginning of a query, perform 100s of queries, then close it. That effectively locks your connection (which is a limited resource on the database side), and thus limits the number of users your website can support. Far better to take from pool, use, put back into pool (before any code that takes a lot of time). Considering that supporting connection pools is a few lines of code (once only in your datasource manager utility class), there is no excuse not to use them.
JeeBee
A: 

Have a look at BoneCP (http://jolbox.com) in the benchmark section for some numbers. Remember that preparedStatements etc are tied to a connection so you'll need to prepare them again and again if you're dealing with connections yourself (a connection pool will cache those for you too).

My best solution so far: Use a lazyDataSource that only gives you a connection when you really need it (i.e. not blindly - if the data can come from a cache then you can avoid the database hit)

A: 

Your question is a bit ambiguous:

Do you want to homegrow a connection pool implementation? If so, this is a nice starting point: http://java.sun.com/developer/onlineTraining/Programming/JDCBook/conpool.html But this is highly discouraged for production environments. Better use an existing and thoroughly tested connection pooling API, like DBCP or C3P0.

Or do you want to know how to use a connection pool? If so, the answer depends on the connection pooling API you're using. It's fortunately usually available at the website of the API in question.

Or do you want to know when/why to use a connection pool? If so, it will surely enhance connecting performance if you have a long-living application (e.g. a webapplication) and you need to connect the database more than often. The normal JDBC practice is namely: acquire and close the Connection, Statement and ResultSet in the shortest possible scope (i.e. inside the very same method block). Because connecting is fairly expensive and can take up to 200ms of time or even more, using a connection pool is much faster. It gives connections on demand and takes care about actually closing the connection. That does however not mean that you may change the way you write JDBC, you still need to acquire and close them in the shorest possible scope. The only thing you need to change is the way you acquire the connection. E.g. change from

connection = driverManager.getConnection();

to

connection = connectionPool.getConnection();

No more changes are needed as long as your JDBC code is well-written.

BalusC