We are using Nagios to monitor our network with great results. There is now a new requirement we are struggling with:
We want to notify Nagios of an non fatal but critical application errors. The application does not stop running but there is some sort of issue that needs looking into.
Once the issue has been looked into, we need some way to "unflag" the issue in Nagios.
We tried using the syslog, but the biggest problem was once an error was logged, the service was put into an error state with no way to recover. Also, while applications would report a critical error to the syslog, most of the time they don't report an "All clear" error.