views:

327

answers:

1

We are getting the following error on a certain database occasionally under moderate load.

"System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached."


I have combed through the code and we are closing the connections in finally blocks like we should except in a few cases which we have established are being called very infrequently. We will fix those pieces of code in our next release but to solve the current production issue, I am suggesting increasing the max pool size to 300. The max concurrent users we are currently experiencing is around 110 which is obviously over the default pool size (100).

I am also suggesting making sure all our connection strings to a particular SQL Server instance are identical to avoid creating multiple connection pools unnecessarily. I am hoping that we can use the USE [Database] statement before our actual SQL queries when we need to switch databases within a single SQL Server instance.

Do you guys have any ideas, pointers, suggestions, or gotchas for us to watch out for?

+1  A: 

You must eliminate the connection leaks. If the cause of the pool exhaustion is leaks, increasing it to 300 is just gonna delay the inevitable. If you leak one connection in 10000 calls (ie. "very infrequently") and you have 110 concurrent requests at, say, 5 seconds a call, you are leaking at a rate of about one connection every 8 minutes that will drain the pool in 13 hours. The timeouts will start showing up much earlier though, as the available pool size will shrink.

If you have hard evidence that s not the leaks that are the root cause but indeed the rate of calls vs. pool size, then you should increase the pool size. Whatever your pool size is you decide to use, if your requests are requiring 1:1 a connection for the whole duration of the requests then you need to throttle/queue the HTTP accepts so it does not exceed your pool size. If not, you can still encounter spikes that exhaust the pool.

Also you may consider using a more resilient connection factory, one that retries and attempts an non-pooled connection if the pool is drained. Of course this goes hand-in-hand with my prior point that if you calibrate your max HTTP accept count to match the pool size, then the pool cannot be exhausted (unless you leak, back to square one). I would not recommend this though, I think is much better to queue up requests in the http.sys territory than in the application resource allocation territory (ie. throttle the max accepted HTTP calls).

And last but not least, reduce the duration of each call. If your call takes in average 5 seconds, then you're seeing 110 connection concurrently at only a mere 22 requests per second. If you reduce the duration of the call by eliminating SQL bottlenecks to 1 second, you'll be able to service 110 requests per second to hit the same resource cap (110 concurrent requests), that is a 5 time traffic increase. The biggest culprit is usually table scans, make sure all your queries are using sensible SQL and have an optimal data access path. As David says, SQL Profiler is your friend.

You can also use SqlConnection.ChangeDatabase to change the database.

Remus Rusanu
The real problem was an obscure connection leak. Thanks for the detailed comments.
Arsalan Ahmed