Sometimes there are severe bugs (new or reintroduced) in productions that go on for days and weeks, and customers do not always notify us. The only tool I have now is grep, awk & perl but I am just being reactive once someone complains.
I want to be proactive and be notified when a certain error has occured for certain number of times in a given time period. But I don't want to be spammed with notifications on every single error.
Are there any lightweight, opensource solutions for a cluster of servers ? Email, SMS or RSS is fine. Also it would be nice to view the reports and trends in a graph too, but not necessary.
Currently I use Apache Log4J, and I know I can send email alerts using it. But as I said, I dont want to be email for every single error. I want to have some intelligence on the system on when it should notify me. And I want that intelligence outside of my application code.