I have a farm of several physical servers each running a large number of Ruby "workers" (daemon-like processes) and I'd like to be able to monitor the health and progress of these processes from a central location, perhaps with historical graphing like Cacti provides. What's the simplest preferably-open-standard protocol for doing something like that? Please note I'm already using monit to keep the processes up and running and under control; what I'm asking for here is a single point of entry (i.e. dashboard) for checking in on them. Thanks.
G'day,
What about having a monitoring process on each server that checks the status of each process and then writes that out to a flat text file, say once every five minutes.
Then another process located on a central server can retrieve at those flat files and trawl through the results and flag any issues.
If you save the individual files and timestamp them, you would also be able to see any trends forming.
Just a quick ideea.
BTW The above system is used to monitor the servers in one of the largest websites in the world. Our scripts are written in Perl with a little bit of shell script but I don't see why you couldn't write your monitoring scripts in Ruby as well.
HTH
cheers,
If you are already using Monit then M/Monit sounds like a perfect match. "M/Monit expand upon Monit's capabilities to provide monitoring and management of all Monit enabled hosts from one simple to use web-interface. " - http://mmonit.com/
I'd suggest to take a look at Zabbix.
It's not as simple as monit, of course, but it allows you to run data collecting agent on each of your servers, with all agents feeding the central reporting and storage server with their data. Those agents can use any custom scripts to get the metrics - you can write simple scripts to extract the data you need from your workers, send it back to the central reporting server and display it there on the dashboard.