views:

254

answers:

3

I have a farm of several physical servers each running a large number of Ruby "workers" (daemon-like processes) and I'd like to be able to monitor the health and progress of these processes from a central location, perhaps with historical graphing like Cacti provides. What's the simplest preferably-open-standard protocol for doing something like that? Please note I'm already using monit to keep the processes up and running and under control; what I'm asking for here is a single point of entry (i.e. dashboard) for checking in on them. Thanks.

A: 

G'day,

What about having a monitoring process on each server that checks the status of each process and then writes that out to a flat text file, say once every five minutes.

Then another process located on a central server can retrieve at those flat files and trawl through the results and flag any issues.

If you save the individual files and timestamp them, you would also be able to see any trends forming.

Just a quick ideea.

BTW The above system is used to monitor the servers in one of the largest websites in the world. Our scripts are written in Perl with a little bit of shell script but I don't see why you couldn't write your monitoring scripts in Ruby as well.

HTH

cheers,

Rob Wells
+1  A: 

If you are already using Monit then M/Monit sounds like a perfect match. "M/Monit expand upon Monit's capabilities to provide monitoring and management of all Monit enabled hosts from one simple to use web-interface. " - http://mmonit.com/

Jonas Elfström
This is good stuff, but I'd like to add some custom metrics, for example "how many jobs has the worker processed" and "how much longer does it think it's going to take". The workers already have that information available; I just need to get it into the control panel. It doesn't seem that M/Monit can do custom metrics.
Teflon Ted
A: 

I'd suggest to take a look at Zabbix.

It's not as simple as monit, of course, but it allows you to run data collecting agent on each of your servers, with all agents feeding the central reporting and storage server with their data. Those agents can use any custom scripts to get the metrics - you can write simple scripts to extract the data you need from your workers, send it back to the central reporting server and display it there on the dashboard.

morhekil