views:

96

answers:

2

My mongrels were not responding, and neither god restart nor cluster restart made a big difference. I dug a little deeper, than i realized that i had plenty of zombie processes.

app 29607 27948 0 19:45 ? 00:00:00 [mongrel_rails] app 30578 21681 0 19:52 ? 00:00:00 [mongrel_rails] app 30704 21405 0 19:53 ? 00:00:00 [mongrel_rails]

However, after i killed the parent processes using this:

"ps -ef | grep defunct | grep -v grep | awk '{print $3}' | xargs kill -9"

The restarts began working. Did my killing of the zombies help the restarts work? If so, it is weird, because i cannot find any references that explains how defunct processes affect normal ones

A: 

It is possible for zombie processes to prevent new processes. Linux can limit the number of unique process and once all those process are in use, you won't be able to create new processes.

On a relatively modern Linux (openSUSE 11.1), the default is 32,768 processes.

On a side note, you can't kill a zombie process as they have already exited. If the parent processes does not reap it's children, you'll need to kill the parent process so that init can reap the zombies.

On further thought, you're probably not hitting the total machine process limit but the per-user ulimit process limit. This limit is usually lower then the total OS limit. To find out what your limit is you can run ulimit -u

R Samuel Klatchko
A: 

I believe the zombie processes are holding on to the predefined set of sockets (eg 8000), and hence preventing proper restart of the mongrel processes.

ambivalence