views:

105

answers:

2

I have several dozen 64-bit Windows 2003 servers in a high performance environment with very bursty system utilization. I am looking for a tool (or tools) to monitor and analyze system performance (eg, CPU utilization, bandwidth, etc).

This tool can either query servers from a central location (SNMP) or require installation of a component on each server. It should poll on a 1-second interval. It should be able to generate pretty graphs which show trends over time. As a nice-to-have, it should be able to send out emails or IMs when certain thresholds are exceeded.

The tools I have investigated so far, including Solarwinds and PRTG, are not designed to poll this frequently. They seem to be designed for a ~30-second interval.

Solarwinds wouldn't go lower than 3-sec, and PRTG chokes at 1-sec. They both default to 1-min. All three tools seem more focused on outage monitoring and reporting than metric collection. Given the bursty nature of our applications, such infrequent polling would result in a very inaccurate picture of performance.

I am considering rolling my own solution using perfmon. This would be a lot of work, and seems like reinventing the wheel. Are there any tools out there that meet my needs?

A: 

We had the same problem, and we chosse to use http://www.nagios.org/ We need to monitor severals servers, and custom services in each server, we need a notification when a service is down, or when a server is out of disk's free space. Nagios does all of that for us

Chocolim
A: 

HP Systems Insight Manager is the free knock-off of OpenView - check it out its event rule system is painful to set up, but when it's up and working it's useful.