views:

99

answers:

4

Hi all

I have been asked to monitor SQL Server (2005 & 2008) and am wondering what are good metrics to look at? I can access WMI counters but am slightly lost as to how much depth is going to be useful.

Currently I have on my list:

  • user connections
  • logins per second
  • latch waits per second
  • total latch wait time
  • dead locks per second
  • errors per second
  • Log and data file sizes

I am looking to be able to monitor values that will indicate a degradation of performance on the machine or a potential serious issue. To this end I am also wondering at what values some of these things would be considered normal vs problematic?

As I reckon it would probably be a really good question to have answered for the general community I thought I'd court some of you DBA experts out there (I am certainly not one of them!)

Apologies if a rather open ended question. Ry

+3  A: 

I would also monitor page life expectancy and your buffer cache hit ratio, see Use sys.dm_os_performance_counters to get your Buffer cache hit ratio and Page life expectancy counters for details

SQLMenace
+1  A: 

Use SQL Profiler to identify your Top 10 (or more) queries. Create a baseline performance for these queries. Review current average execution times vs. your baseline, and alert if significantly above your baseline. You can also use this list to identify queries for possible optimization.

This attacks the problem at a higher level than just reviewing detailed stats, although those stats can also be useful. I have found this approach to work on any DBMS, including MySQL and Oracle. If your top query times start to go up, you can bet you are starting to run into performance issues, which you can then start to drill into in more detail.

Gary
A: 

Budget permitting, it's worth looking at some 3rd party tools to help. We use Idera's SQL Diagnostic Manager to monitor server health and Confio's Ignite to keep an eye on query performance. Both products have served us well in our shop.

Joe Stefanelli
A: 

Percent CPU utilization and Average disk queue lengths are also pretty standard. CPUs consistently over 80% indicates you may need more or better CPUs (and servers to house them); Consistently over 2 on any disk queue indicates you have a disk I/O bottleneck on that drive.

Philip Kelley