views:

714

answers:

8

I have a Ruby on Rails Website that makes HTTP calls to an external Web Service.

About once a day I get a SystemExit (stacktrace below) error email where a call to the service has failed. If I then try the exact same query on my site moments later it works fine. It's been happening since the site went live and I've had no luck tracking down what causes it.

Ruby is version 1.8.6 and rails is version 1.2.6.

Anyone else have this problem?

This is the error and stacktrace.

A SystemExit occurred
/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.6/lib/fcgi_handler.rb:116:in `exit'
/usr/local/lib/ruby/gems/1.8/gems/rails-1.2.6/lib/fcgi_handler.rb:116:in `exit_now_handler'
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support/inflector.rb:250:in `to_proc'
/usr/local/lib/ruby/1.8/net/protocol.rb:133:in `call'
/usr/local/lib/ruby/1.8/net/protocol.rb:133:in `sysread'
/usr/local/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
/usr/local/lib/ruby/1.8/timeout.rb:56:in `timeout'
/usr/local/lib/ruby/1.8/timeout.rb:76:in `timeout'
/usr/local/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
/usr/local/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
/usr/local/lib/ruby/1.8/net/protocol.rb:126:in `readline'
/usr/local/lib/ruby/1.8/net/http.rb:2017:in `read_status_line'
/usr/local/lib/ruby/1.8/net/http.rb:2006:in `read_new'
/usr/local/lib/ruby/1.8/net/http.rb:1047:in `request'
/usr/local/lib/ruby/1.8/net/http.rb:945:in `request_get'
/usr/local/lib/ruby/1.8/net/http.rb:380:in `get_response'
/usr/local/lib/ruby/1.8/net/http.rb:543:in `start'
/usr/local/lib/ruby/1.8/net/http.rb:379:in `get_response'
+2  A: 

Using fcgi with Ruby is known to be very buggy.

Practically everybody has moved to Mongrel for this reason, and I recommend you do the same.

Michiel de Mare
A: 

Cool, thanks for the tip. I should have suspected FCGI as I had heard it was buggy.

I'll have a play with Mongrel - good to see it can co-exist with Apache too.

Darren Greaves
+3  A: 

It's been awhile since I used FCGI but I think a FCGI process could throw a SystemExit if the thread was taking too long. This could be the web service not responding or even a slow DNS query. Some google results show a similar error with Python and FCGI so moving to mongrel would be a good idea. This post is my reference I used to setup mongrel and I still refer back to it.

Eric

Eric Davis
A: 

Yeah, I remember now - I worked out that it was a system timeout. But the timeout should have been caught by my exception handling code - but it never was.

That Mongrel post looks pretty useful. Thanks again.

Darren Greaves
A: 

I would also take a look at Passenger. It's a lot easier to get going than the traditional solution of Apache/nginx + Mongrel.

Simon
A: 

Hi, I've decided to go with Mongrel and now have a working test setup. I can't see how to set each Mongrel instance to write to its own application log file (production.log) though.

Having three different processes writing to the same file seems like a recipe for disaster...

Darren Greaves
+2  A: 

I used to get these all the time on Apache1/fastcgi. I think it's caused by fastcgi hanging up before Ruby is done.

Switching to mongrel is a good first step, but there's more to do. It's a bad idea to cull from web services on live pages, particularly from Rails. Rails is not thread-safe. The number of concurrent connections you can support equals the number of mongrels (or Passenger processes) in your cluster.

If you have one mongrel and someone accesses a page that calls a web service that takes 10 seconds to time out, every request to your website will timeout during that time. Most of the load balancers just cycle through your mongrels blindly, so if you have two mongrels, every other request will timeout.

Anything that can be unpredictably slow needs to happen in a job queue. The first hit to /slow/action adds the job to the queue, and /slow/action keeps on refreshing via page refreshes or queries via ajax until the job is finished, and then you get your results from the job queue. There are a few job queues for Rails nowadays, but the oldest and probably most widely used one is BackgroundRB.

Another alternative, depending on the nature of your app, is to cull the service every N minutes via cron, cache the data locally, and have your live page read from the cache.

cpm
A: 

@cpm
Hi, thanks for the info.
Unfortunately, due to the nature of the app there's no way of knowing what data will be required until the page is requested so pre-caching is not an option.

You can see the site here (now running on a Mongrel "pack") by the way...

Darren Greaves