I'm looking into creating some monitoring for various aspects of our systems and wanted to see if anyone had an best practices or methodologies they use. One thing I'm trying to figure out is the best way to prevent cascading alarms.
For example if the database is down then you can expect alarms X Y and Z to go off, but X Y and Z aren't the core problems so I don't want to be bothered with that noise.