views:

292

answers:

3

I am building a ruby application that grabs emails sent to a server and logs them to a database. Right now I don't have is a way to fully test the (Email -> Ruby -> Database) stack for downtime. I am using services that test the server the ruby is running on for downtime, and i'm using monit to make sure that the ruby daemon doesn't go down for too long. Besides manually checking periodically, are there any services I can use to verify:

1) Is my postfix still up and receiving/sending mail

2) Are the messages still making it from my daemon to the database

If not, are there any best practices for monitoring and sending alerts for either of those two scenarios, or any home brew methods that could work reliably?

+1  A: 

You could have a cron job send a "canary" message through, and then have another cron job test to see if the expected canary message was written to the database. (optionally deleting it, etc)

DGM
i was considering that, but didn't know how hackish this method was...the more i think about it the more i like it. Any additional thoughts are appreciated
ThinkBohemian
You could send the canary from a remote system if you want... with adequate timestamping you could trace the latency through the system... If you have stats on the messages, you may wish to program it to ignore the canaries. Then you need a system to look for the canaries and email you if they don't show up.
DGM
A: 

If you are looking for a commercial tool, www.logicmonitor.com can do that. It has Postfix monitoring (monitoring status, and graphing messages queued, delivered, bounced, rejected, etc). Also database monitoring (Mysql, oracle, postgres, sql server) - so it will alert and trend on the database, and provide some advice on tuning if needed. It can also easily track things like time of last insertion of email into database - and alert if greater time than expected. Plus monitor log files to track application response times.

Depends on the criticality of this system, and the time/money trade off.

Steve Francis
A: 

I'm not experienced with Monit, but it might be doable to set up these tests. If it isn't, I'd recommend you have a look at Nagios - the API for writing your own tests is really simple.

Test that:

  • The SMTP server is responding on the network.
  • The postfix processes are running.
  • The postfix queues are empty.

Testing the round-trip could be done with DGMs "canary" suggestion - if you can set up a routine like that it probably has the best precision and gives you the quickest reaction on an error.

An alternative that can be useful if traffic is relatively frequent is to monitor the logs of postfix and possibly the database server - check that the last successful message is no older than for example 30 minutes (for an appropriate value of "30"... and "minutes"...). This approach will be slower to react, but will cover more possible error conditions.

Anders Lindahl