views:

36

answers:

2

Sometimes there are severe bugs (new or reintroduced) in productions that go on for days and weeks, and customers do not always notify us. The only tool I have now is grep, awk & perl but I am just being reactive once someone complains.

I want to be proactive and be notified when a certain error has occured for certain number of times in a given time period. But I don't want to be spammed with notifications on every single error.

Are there any lightweight, opensource solutions for a cluster of servers ? Email, SMS or RSS is fine. Also it would be nice to view the reports and trends in a graph too, but not necessary.

Currently I use Apache Log4J, and I know I can send email alerts using it. But as I said, I dont want to be email for every single error. I want to have some intelligence on the system on when it should notify me. And I want that intelligence outside of my application code.

A: 

Try http://logging.apache.org/log4net/index.html

madan
It would spam me for every single error.
Langali
+1  A: 

Can you add something that runs once per day that does all the greps you do and either sends or emails you the results? Alternately you can send the results to the customer's admin so they can elevate it to you.

SDGator
The app is pretty heavily used. So the thing would have to notify me of anything seriously wrong at least within an hour.
Langali
Is this a system where you can add a checking task to the crontab to automatically run however often you need it to run? Since it so heavily used, a hack-ey solution might be for the app itself to check how long its been since it did the last error scan and initiate a new one if its been too long since the last.
SDGator