I am the primary developer on a very sensitive system for my company. This code is designed pretty well but there are a few flaws in it that make it a little unstable. We are of course working to fix the flaws that cause the stability issues but in the meantime we have some things go wrong from time to time. The "wrong" thing going "wrong" could be very bad for the company though so it's imperative that in the interim we identify and fix the problems very quickly. Longer term I would like to have an automated monitoring system to do sanity checks on data and other things that will notify us of problems as they occur. Right now though in an effort to just make sure nothing catastrophic happens before we get to that point I am seeking some advice.
We have several checks (mostly data checks that can be done with a simple SQL query) to run every day. Others that should be run weekly and others monthly. In the past I have given these queries to others and made it their job to make sure they are run when they need to be. Unfortunately humans being imperfect and with inevitable turn over we always seem to end up discovering something bad that happened later than we would have liked because one or more of these manual checks were not run. Can someone offer advice or let me know of an application that might help me manage these scripts or perhaps an existing application that may do some of this work for me? At this point my only option would be a free application but if someone has a suggestion of something not free I would put it on the list of things to consider later. I know my company has an Open NMS monitoring system in place but the people in charge will not relinquish any control to me so that I can configure it for my system while at the same time they don't respond to my requests to set up monitoring at all. My company has also in the past used Nagios but I don't think either of these do exactly what I want as I'm not looking for web monitoring primarily.
Appreciate for any help / advice.