views:

9

answers:

1

I've recently been working with our passenger setup and monitoring our app via NewRelic's RPM. As of the last week I've noticed that the production version of our app restarts about once an hour (doesn't track to exactly once an hour, it's seemingly random, and only happens during the day that I can tell - though there are seldom requests at night so I never see the startup blip). However the other sites on the same box do not.

Taking a look at passenger-status I see this:

----------- Domains -----------
/web/marketing/current: 
  PID: 2897    Sessions: 0    Processed: 178     Uptime: 22h 35m 58s

/web/demo/current: 
  PID: 11664   Sessions: 0    Processed: 58      Uptime: 17h 14m 59s
  PID: 11026   Sessions: 0    Processed: 20      Uptime: 17h 50m 21s

/web/production/current: 
  PID: 20103   Sessions: 0    Processed: 12      Uptime: 9m 49s
  PID: 20107   Sessions: 0    Processed: 3       Uptime: 9m 49s
  PID: 20099   Sessions: 0    Processed: 20      Uptime: 9m 49s
  PID: 20032   Sessions: 0    Processed: 20      Uptime: 11m 46s
  PID: 20105   Sessions: 0    Processed: 17      Uptime: 9m 49s
  PID: 20101   Sessions: 0    Processed: 2       Uptime: 9m 49s
  PID: 20110   Sessions: 0    Processed: 1       Uptime: 9m 43s

Our passenger setup is currently:

PassengerRoot /usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.15
PassengerRuby /usr/local/bin/ruby_gc_wrapper

PassengerMaxPoolSize 20
PassengerUseGlobalQueue on
PassengerStatThrottleRate 120
PassengerPoolIdleTime 0

RailsSpawnMethod smart
RailsFrameworkSpawnerIdleTime 0
RailsAppSpawnerIdleTime 0

and ruby_gc_wrapper looks like:

#!/bin/sh

# wrap ruby with gc tuning parameters

export RUBY_HEAP_MIN_SLOTS=500000
export RUBY_HEAP_SLOTS_INCREMENT=250000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
export RUBY_GC_MALLOC_LIMIT=50000000
export RUBY_HEAP_FREE_MIN=4096
exec "/usr/local/bin/ruby" "$@"

From what I understand PassengerPoolIdleTime 0 should prevent the app from timing out. The only difference, that I know of, between the demo instance and the production instance is how much more often the prod one is called. However I don't have PassengerMaxRequests set anywhere so I'm baffled as to why it'd suddenly restart like this. I've looked at logrotate, monit and others to see if there are any outside processes messing with apache2 but if that were happening I'd expect all processes to have the same uptime.

Really rather strange. Any clue?

A: 

After closer inspection the restarts were far more regular than I initially thought. While they were pegged to a specific time they were generally around 1 hour apart for the last 3 hours and around 15 minutes after the hour. Turns out there's one thing that runs 15 after the hour on that box: chef.

Now the baffling thing is why it would restart only one of the applications and not all of them. That I still don't know but there are possibilities in there. Either way, disable chef automatically running (which I prefer not to do in production anyway) and now I have instances that are up for 5 hours, and a tiny response time. Beautiful.

bnferguson