views:

22

answers:

1

We're running Ruby workers across a large number of machines using Resque. Every once in a while, we see segmentation faults in our Resque worker processes. It's hard to debug these, because they're fairly rare, and we must run tens of thousands of distributed jobs to trigger a crash.

Ideally, we'd like to capture backtraces and core files after each crash, and automatically upload them to a central server. In other words, we're looking for something like Gnome's "Bug Buddy", but completely automated and able to catch faults when the Ruby interpreter dumps core. (Similar GUI-based products include MacOS X Crash Reporter, Windows Error Reporting, KDE's Dr. Konqi, and Mozilla's Breakpad. But we need something which runs on a headless, unattended server.)

So far, the most promising option appears to be Ubuntu's Apport, which can intercept crashes in server processes and save them to disk. Apport normally uploads crashes to Ubuntu's Launchpad, but it also comes with a Python library.

Does anyone have any recommendations or first-hand experience using these libraries? I'm asking here rather than on Server Fault because the solutions are likely to involve programming or code changes.

A: 

Check out Hoptoad, getExceptional, and New Relic. All are SaaS options that will do this for you (New Relic will also track performance). If you want to roll your own, try the exception_notification gem.

Mark Thomas
We already have tools for catching Ruby-level exceptions. We need tools to catch segmentation faults and bus errors in the Ruby process itself. Do any of the tools you suggested do this? Googling for "Hoptoad segmentation fault" does not turn up any instructions.
emk
No, unfortunately these require the Ruby interpreter to be still running. Sorry I missed the "segmentation fault" part. At this point, it's really not a Ruby question anymore. Server fault may be your best bet.
Mark Thomas