views:

1169

answers:

5

I have a cluster of three mongrels running under nginx, and I deploy the app using Capistrano 2.4.3. When I "cap deploy" when there is a running system, the behavior is:

  1. The app is deployed. The code is successfully updated.
  2. In the cap deploy output, there is this:

    • executing "sudo -p 'sudo password: ' mongrel_rails cluster::restart -C /var/www/rails/myapp/current/config/mongrel_cluster.yml"
    • servers: ["myip"]
    • [myip] executing command
    • ** [out :: myip] stopping port 9096
    • ** [out :: myip] stopping port 9097
    • ** [out :: myip] stopping port 9098
    • ** [out :: myip] already started port 9096
    • ** [out :: myip] already started port 9097
    • ** [out :: myip] already started port 9098
  3. I check immediately on the server and find that Mongrel is still running, and the PID files are still present for the previous three instances.
  4. A short time later (less than one minute), I find that Mongrel is no longer running, the PID files are gone, and it has failed to restart.
  5. If I start mongrel on the server by hand, the app starts up just fine.

It seems like 'mongrel_rails cluster::restart' isn't properly waiting for a full stop before attempting a restart of the cluster. How do I diagnose and fix this issue?

EDIT: Here's the answer:

mongrel_cluster, in the "restart" task, simply does this:

 def run
   stop
   start
 end

It doesn't do any waiting or checking to see that the process exited before invoking "start". This is a known bug with an outstanding patch submitted. I applied the patch to Mongrel Cluster and the problem disappeared.

A: 

I hate to be so basic, but it sounds like the pid files are still hanging around when it is trying to start. Make sure that mongrel is stopped by hand. Clean up the pid files by hand. Then do a cap deploy.

salt.racer
+1  A: 
Ryan McGeary
Thanks Ryan. I think you got me unstuck. When I get it solved, I'll follow up.
Pete
+4  A: 

You can explicitly tell the mongrel_cluster recipes to remove the pid files before a start by adding the following in your capistrano recipes:

# helps keep mongrel pid files clean
set :mongrel_clean, true

This causes it to pass the --clean option to mongrel_cluster_ctl.

I went back and looked at one of my deployment recipes and noticed that I had also changed the way my restart task worked. Take a look at the following message in the mongrel users group:

mongrel users discussion of restart

The following is my deploy:restart task. I admit it's a bit of a hack.

namespace :deploy do
  desc "Restart the Mongrel processes on the app server."
  task :restart, :roles => :app do
    mongrel.cluster.stop
    sleep 2.5
    mongrel.cluster.start
  end
end
rwc9u
This is on the right track. See my edit to the question: there's a patch to mongrel_cluster that fixes the behavior.
Pete
+1  A: 

Either way, my mongrels are starting before the previous stop command has finished shutting 'em all down.

sleep 2.5 is not a good solution, if it takes longer than 2.5 seconds to halt all running mongrels.

There seems to be a need for:

stop && start

vs.

stop; start

(this is how bash works, && waits for the first command to finish w/o error, while ";" simply runs the next command).

I wonder if there is a: wait cluster_stop then cluster_start

Pete
A: 

Good discussion: http://www.ruby-forum.com/topic/139734#745030