tags:

views:

244

answers:

4

I have an application server (jetty 6 on a linux box) hosting 15 individuals applications (individual war's). Every 3 or 4 days I get an alert from nagios regarding the number of open TCP connections. Upon inspection, I see that the vast majority of these connections are to the MySQL server.

netstat -ntu | grep TIME_WAIT

Shows 10,000+ connections on the MySQL server from the application server (notice the state is TIME_WAIT). If I restart jetty the connections drop to almost zero.

Some interesting values from a show status:

mysql> show status;
+--------------------------+-----------+
| Variable_name            | Value     |
+--------------------------+-----------+
| Aborted_clients          | 244       |
| Aborted_connects         | 695853860 |
| Connections              | 697203154 |
| Max_used_connections     | 77        |
+--------------------------+-----------+

A "show processlist" doesn't show anything out of the ordinary (which is what I would expect since most of the connections are idle - remember the TIME_WAIT state from above).

I have a TEST env for this server but it never has any issues. It obviously doesn't get much traffic and the application server is constantly getting restarted so debugging there isn't much help. I guess I could dig into each individual app and write a load test which would hit the database code, but this would take a lot of time / hassle.

Any ideas how I could track down the application that is grabbing all these connections and never letting go?

A: 

Well, one thing that comes to mind (although I'm not an expert on this) is to increase the logging on mySQL and hunt down all the connect/close messages. If that doesn't work, you can write a tiny proxy to sit in between the actual mySQL server and your suite of applications which does the extra logging and you'll know who is connecting/leaving.

Noufal Ibrahim
I could do this in the TEST env, but then I am back to writing load tests on the db code again (so I can get some activity in the logs). I was hoping for some MySQL magic to track a dead connection to a user / schema / host / etc...
jckdnk111
Why not increase logging on the production server?
Noufal Ibrahim
From my.cnf directly above the logging section "Be aware that this log type is a performance killer." Also, this would require a restart of the PROD MySQL server. Since this db server hosts many other live projects, I can't really afford to mess with it unnecessarily.
jckdnk111
+1  A: 

Possibly the connection pool(s) are misconfigured to hold on to too many connections and they're holding on to too many idle processes.

Aside from that all I can think of is that some piece of code is holding onto a result set, bt that seems less likely. To catch if it's a slow query that's timing out you can also set mysql to write to a slow query log in the conf file, and it'll then write all queries that are taking longer than X seconds (default is 5, I think).

Steve B.
I am logging slow queries and that doesn't really seem to be an issue. I looked at the connection pool configs and they all look pretty sane to me. The pooling mechanisms vary quite a bit (DBCP, butterfly persistence, Hibernate / JPA, beenkeeper, iBatis, etc...) so I'm not entirely confident on my ability to spot a misconfiguration.
jckdnk111
A: 

The answer seems to be adding the following entries in my.cnf under [mysqld] :

wait_timeout=60
interactive_timeout=60

I found it here (all the way at the bottom): http://community.livejournal.com/mysql/82879.html

The default wait time to kill a stale connection is 22800 seconds. To verify:

mysql> show variables like 'wait_%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 60    |
+---------------+-------+

EDIT: I forgot to mention, I also added the following to my /etc/sysctl.conf:

net.ipv4.tcp_fin_timeout = 15

This is supposed to help lower the threshold the OS waits before reusing connection resources.

EDIT 2: /etc/init.d/mysql reload won't really reload your my.cnf (see the link below)

jckdnk111
I'm not sure that reload actually reloads the settings without a full restart. Check its behaviour and the documentation.
MarkR
Good point - http://serverfault.com/questions/79043/reload-my-cnf-without-restarting-mysql-service
jckdnk111
A: 

SHOW PROCESSLIST shows the user, host and database for each thread. Unless all of your 15 apps are using the same combination, then you should be able to differentiate using this information.

Phil Wallach
Only for live connections - it doesn't show stale connections.
jckdnk111