views:

24

answers:

1

Hi,

This is more of an "general architecture" problem. If you have a cron job (or even a Windows scheduled task) running periodically, its somewhat simple to have it send you an email / text message that all is well, but how do I get informed when everything is NOT okay? Basically, if the job doesn't run at its scheduled time or Windows / linux has its own set of hangups that prevent the task from running...?

Just seeking thoughts of people who've faced this situation before and come up with interesting solutions...

A: 

A way I've done it in the past is to simply put at the top of each script (say, checkUsers.sh):

touch /tmp/lastrun/checkUsers.sh

then have another job that runs periodically that uses find to locate all those "marker" files in tmp/lastrun that are older than a day.

You can fiddle with the timings, having /tmp/lastrun/hour/ and tmp/lastrun/day/ to separate jobs that have different schedules.

Note that this won't catch scripts that have never run since they will never create the initial file for find-ing. To alleviate that, you can either:

  • create that file manually when creating the cron job (won't handle situations where someone inadvertently deletes the marker file); or
  • maintain a list of required marker files somewhere so that you can detect when they're missing as well as outdated.

And, if your cron job is not a script, put the touch directly into crontab:

0 4 * * * ( touch /tmp/lastrun/daily/checkUsers ; /usr/bin/checkUsers )

It's a lot easier to validate a simple find script than to validate every one of your cron jobs.

paxdiablo
Forgive me if this is a stupid question, but if one cron fails, would it be likely they all would (if it were a hardware or software error) ?
alex
Not necessarily, the problem may be with the line in the crontab file or it may be permissions on the script. If there's a fault with cron itself, yes. Then you can put it in your login script (or `/etc/profile` or mount an NFS filesystem elsewhere touching files on that, hoping that its cron will be running for checks). But, if `cron` itself is not working, there's few other options for automating a check. At some point, you have to trust that _one_ layer of software will function okay :-) I didn't say it was foolproof - all it does is make it easier to detect a fault.
paxdiablo
Interesting approach... not _exactly_ what I was looking for... (a more complete solution, possibly using external systems as well)
DrMHC