views:

2577

answers:

4

I have a highly trafficked application on one debian machine and apache has started acting strange.

Every time I start apache, tons of apache processes are spawned, the app doesn't load at all, and very quickly the whole machine freezes and must be powercycled to reboot.

Here is what I get for top immediately after starting apache:

top -   20:14:44    up         1:16,      2 users,    load average: 0.48, 0.10, 0.03
Tasks:  330 total,  5 running, 325 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.0%us,    21.4%sy,   0.0%ni,        65.7%id,   0.2%wa,  0.1%hi,  0.7%si,  0.0%st
Mem:    8179920k    total,     404984k used,  7774936k free,    60716k buffers
Swap:   2097136k    total,     0k used,       2097136k free,    43424k cached


10251 www-data  15   0  467m 8100 4016 S    6  0.1   0:00.04 apache2
10262 www-data  15   0  467m 8092 4012 S    6  0.1   0:00.05 apache2
10360 www-data  15   0  468m 8296 4016 S    6  0.1   0:00.05 apache2
10428 www-data  15   0  468m 8272 3992 S    6  0.1   0:00.05 apache2
10241 www-data  15   0  467m 8256 4012 S    4  0.1   0:00.03 apache2
10259 www-data  15   0  467m 8092 4012 S    4  0.1   0:00.04 apache2
10274 www-data  15   0  467m 8056 4012 S    4  0.1   0:00.03 apache2
10291 www-data  15   0  468m 8292 4012 S    4  0.1   0:00.03 apache2
10293 www-data  15   0  468m 8292 4012 S    4  0.1   0:00.03 apache2
10308 www-data  15   0  468m 8296 4016 S    4  0.1   0:00.02 apache2
10317 www-data  15   0  468m 8292 4012 S    4  0.1   0:00.02 apache2
10320 www-data  15   0  468m 8292 4012 S    4  0.1   0:00.04 apache2
10325 www-data  15   0  468m 8292 4012 S    4  0.1   0:00.04 apache2

And so forth.. with more apache2 processes.

Less than a minute later, you can see below that the load has gone from 0.48 to 2.17. If I do not stop apache at this point, the load continues to rise over a few minutes or less until the machine dies.

top -    20:15:34 up 1:17,       2 users,  load average: 2.17, 0.62, 0.21
Tasks:   1850 total,  5 running, 1845 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,      2.1%sy,    0.0%ni, 96.4%id,  0.0%wa,  0.1%hi,  1.0%si,  0.0%st
Mem:     8179920k     total,     1938524k used,  6241396k free,    60860k buffers
Swap:    2097136k     total,     0k used,  2097136k free,    44196k cached

We have a firewall where we whitelist the addresses we know are allowed to hit our site.

Any ideas about what the problem might be are very welcome.

Thanks!

+7  A: 

Have you changed your configuration file recently? If yes, I trust you keep the old version for diffing?

If not, search for the "StartServers", "MaxSpareServers" and "MinSpareServers" directives. Generally you want to leave these at defaults, but it's possible that they were intentionally set high (bad idea) or accidentally set that way due to a bad config edit.

If this doesn't help, it's time to look outside Apache, for some process that's opening connections at a fast rate (could be that there's a testing process that's run amok).

First step is the access log. Second step is to run netstat, to see where the connections might be coming from. And if it's running on the same system, you can look in /proc/*/fd to find the two ends of the connection.

kdgregory
I can second that old version keeping very much. We even have /etc/ in our SCM.
mark
+1  A: 

You have probably made the error of configuring Apache to use far more than all of your ram. This is an easy mistake to make.

I am assuming you are using a Prefork Apache, and an in-process application server (such as PHP or mod_perl). In this model, you will end up with a maximum of (MaxClients * max memory usage of your application per process) memory used. If you don't have nearly that much, it's time to decrease one, the other or both.

In the general case, this means decreasing MaxClients to the point where your server has enough ram to cope.

The default values typically used for MaxClients (150 is typical) are not suitable for running an in-process heavyweight application server on a modest machine if you are using the Prefork model (Most application servers either don't support, or discourage, the use of threaded models).

However, decreasing MaxClients will eventually cause the application to become unavailable, particularly if you have keepalives on and the keepalive timeout too long. Processes which are just keeping a connection alive (state K in server-status) still use a lot of RAM, and that may be a problem - try to minimise keepalive timeout, or turn it off altogether.

You need to keep an eye on server-status (as provided by mod_status).

Of course you should only make ANY of these changes if you understand the consequences. Think twice, change the config once. If you have ANY ability to test the changes with simulated load on a similar spec non-production machine, do so.

MarkR
A: 

As has been said (assuming Prefork Apache) - MaxClients = max processes at once.

If you find you are getting hammered with real traffic (and not a mis-configured StartServers/Min/MaxSpareServers), there are some other things you can do:

  1. Set up a separate, lightweight apache process (or lighttpd) for your static content. That way all the small, static stuff doesn't "pollute" your heavy-weight app process. This can be on the same server, or a different one. Doesn't matter.
  2. Put a reverse proxy like Squid in front of your Apache process. The reverse proxy will quickly suck down the content from Apache and store it in memory and then parcel it back out to the client. This way AOL users on 14.4kb modems don't hog one of your valuable Apache slots. As a bonus, such a setup can be configured to cache some of your content to reduce the load on your Apache processes.
Cory R. King
A: 

Your 'top' output shows that you have plenty of free memory, so I don't think that MaxClients is an issue (unless there is some problem with Apache allocating more than 2GB of memory?) Your error log should show errors if it is having problems creating more children.

Most likely, your Apache processes really are using a lot of resources. If you are running PHP apps, try installing eAccelerator which does a good job of optimizing and caching PHP code. Other things might include heavy MySQL queries, a slow DNS resolver, etc. Beyond that, it gets more into understanding what programs are being hit and what they are doing.

Brandon