Nginx Removing Upstream Servers From Pool

Our setup is standard nginx (ver 0.7.59) + thin upstream servers on Debian lenny. Right now we're on 1 beefy box for web / app and 1 db box. Recently we started noticing thins will eventually start "hanging", i.e. they will no longer receive requests from nginx. We have 15 thins running, and after 10-15 minutes, the first 1 or 2 will be hung. If left all day, those same few thins plus a few more will remain hung. The only fix we've seen so far is restarting nginx. After a restart, the hung thins begin receiving requests again right away. Because of this, it seems like those thins might have been taken out of the upstream pool.

If I understand the docs (http://wiki.nginx.org/NginxHttpUpstreamModule#server) correctly, with the defaults (which we have), if nginx can't "communicate" with a backend server 3 times within 10 seconds, it will set that upstream server to an "inoperative" state. It will then wait 10 seconds, then try that server again. That makes sense, but we're seeing the thin hang indefinitely. I tried setting max_fails to 0 for each of the thins, but that didn't help. I can't find out what would cause an upstream server to become permanently "inoperative".

We've seen big growth rate increases recently, so we're not sure if it could be related to that, or just more apparent as a result of more traffic in a shorter period of time.

Is there something else (a changeble directive or other conditions) in nginx that would cause it to take a server completely out of the pool?

We have had numerous issues with nginx's reverse proxy support and ultimately have achieved a better architecture by putting HAProxy between Mongrel and nginx. So our architecture is:

web => nginx => haproxy => Mongrels

What we saw earlier (before HAProxy) was that nginx would flood Mongrels with too many requests and Mongrel's request queue was not solid and it would quickly get stuck with too many queued requests. HAProxys queue is much more stable and it better balances all the requests between backends than nginx does. nginx only offers round-robin balancing when really an algorithm like least-connections is better. I dont know if Thin suffers from the same issue(s) as Mongrel.

In our new setup nginx just proxies to a single haproxy instance and haproxy has all registered Mongrels configured. HAProxy has better support for upstream ok/fail detection and can also limit each app server to 1 connection (maxconn directive) which is key for Mongrel, not sure about Thin.

The maxconn directive is so key that EngineYard has a patch for nginx which makes it native to nginx, so you dont need to deploy HAProxy just to take advantage of it.

See: nginx-ey-balancer

Yeah we'd seen that before and I forgot to mention we use the fair proxy balancer plugin (http://brainspl.at/articles/2007/11/09/a-fair-proxy-balancer-for-nginx-and-mongrel; http://wiki.nginx.org/NginxHttpUpstreamFairModule), which uses a least-busy algo instead of round-robin. It works well and the # requests / thin is very close to even over time.I was hoping to not have to introduce another layer like HAProxy if we can just figure out what's causing the thins to no longer receive requests from nginx.So nginx-ey-balancer basically mimics the maxconn algo of HAProxy w/ needing HAProxy?

Brian 2009-08-31 16:45:54

Yes, nginx-ey-balancer mimics the maxconn algo of HAProxy w/ needing HAProxy. Which is the pretty much the whole reason we switched to haproxy to begin with, now its just part of our architecture for good.

Cody Caughlan 2009-08-31 16:50:18

Awesome thanks for your help. Think I'll try nginx-ey-balancer. Although it only has patches for nginx v 0.6.34 / 35 and 0.8.0. I'll see which one more closely matches 0.7.59 and hope for the best.. or wait till 0.8.x becomes stable.

Brian 2009-08-31 17:00:07

ansaurus

tags:

views:

answers:

Nginx Removing Upstream Servers From Pool

related questions