I have a cluster of three mongrels running under nginx, and I deploy the app using Capistrano 2.4.3. When I "cap deploy" when there is a running system, the behavior is:
- The app is deployed. The code is successfully updated.
In the cap deploy output, there is this:
- executing "sudo -p 'sudo password: ' mongrel_rails cluster::restart -C /var/www/rails/myapp/current/config/mongrel_cluster.yml"
- servers: ["myip"]
- [myip] executing command
- ** [out :: myip] stopping port 9096
- ** [out :: myip] stopping port 9097
- ** [out :: myip] stopping port 9098
- ** [out :: myip] already started port 9096
- ** [out :: myip] already started port 9097
- ** [out :: myip] already started port 9098
- I check immediately on the server and find that Mongrel is still running, and the PID files are still present for the previous three instances.
- A short time later (less than one minute), I find that Mongrel is no longer running, the PID files are gone, and it has failed to restart.
- If I start mongrel on the server by hand, the app starts up just fine.
It seems like 'mongrel_rails cluster::restart' isn't properly waiting for a full stop before attempting a restart of the cluster. How do I diagnose and fix this issue?
EDIT: Here's the answer:
mongrel_cluster, in the "restart" task, simply does this:
def run
stop
start
end
It doesn't do any waiting or checking to see that the process exited before invoking "start". This is a known bug with an outstanding patch submitted. I applied the patch to Mongrel Cluster and the problem disappeared.