views:

1009

answers:

3

We have 50+ Java batch processes that run at different times of the day. They run on a Solaris box, and are started via cron. Currently, the only way we only know if they succeed or fail is by an email generated at the end of each batch process. We have a support team that monitors these emails. Recently, we've had issues with emails not being received, even though the batches are running. There must be a better way.

Without having to reinvent the wheel, are there any open source batch monitoring applications?

And a more general question, what is the best way to monitor batch processes?

+3  A: 

Is there currently some batch management system in-place? Or are the jobs run through the OS scheduler? (ie, Windows Schedule Tasks or *nix cron)

Quartz is an Open Source (Apache License) java-based job scheduler that has infrastructure in place for listeners that can be used for notification purposes, but there would be some code development involved.

Ken Gentle
Sorry for leaving that info out. I updated the question with "They run on a Solaris box, and are started via cron." I'll take a look at Quartz.
Neal Swearer
+3  A: 

I don't know about open source batch monitoring applications but there is a new sub project of Spring: Spring-Batch that provides a batch processing framework. I've used it successfully in a few new projects.

When you kick off a batch job you can wire up a job execution listener. In my case, when the job fails with an exception, my execution listener intercepts this and sends out an error email with the pertinent stack trace to a well known email list. I use a Tasklet at the end of the batch to send an email to indicate normal completion.

Of course, if there is an error in the email subsystem (and the message doesn't get sent) all bets are off...

+1  A: 

There must be a way to use Nagios to see if daily tasks have run successfully, given that it can monitor things in so many different ways (from PID files to text files being present, to trawling log files, etc). Sadly Nagios isn't in my line of work so I can't go further.

JeeBee