Our team has a number of processes which we run manually but which may run for many days. The processes do various things to large numbers of entities (web pages, database rows, images, files, etc). Obviously from time to time there are failures and we have to design or processes to handle these failures gracefully and move on so the whole job is not brought down.
Depending on the particular process in question, the rate, severity and urgency of failures varies. In some cases we send emails when a rare but important error happens, in other cases we just log it and move on, and so on.
The problem is that we have different error handling code scattered everywhere and more often than not when we "log it and move on" no one ever goes back and reads the logs, so no one ever knows what problems occurred. We can't default to email for all problems because there would simply be too many emails.
These are long running processes but not daemons where something like SNMP or Nagios might feel like a good fit. Surely this is a fairly common problem but I cannot seem to find many solutions online. I've heard people talking about using log4j (or other similar logging packages) to log to a database, etc. which seems like it might be a step in the right direction, but surely there are more sophisticated solutions out there by now..? I'm imagining something where your logger writes events to a database and there's a Nagios-like web interface that lets you see what errors are happening with what processes in real time as well as configure email alerts for specific patterns, etc.
Does something like this exist? If not, what approaches have you used to successfully deal with similar issues?
(For what it's worth most of our codebase is in python but I would imagine any decent implementations of this idea are largely non-anguage-specific and obviously any conceptual solutions would be as well).
Update: I just spent some time looking at Chainsaw, which is kind of what I am looking for, but I'd like it to be a webapp instead of a desktop app, and have alerting functionality.
Update: I just discovered hoptoadapp and exceptional which are both somewhat along the lines of what I was thinking, though both target Rails specifically.