views:

88

answers:

0

I need to be able to tell if a series of servers have anyone active on them, and if not then to automatically shut them down (turn off the VM). It's not a trivial task, because I have 1000+ server instances that include an assortment of OSes (Win, Unix, Linux) and many different type of configurations. This makes installing an uptime agent on the boxes non-trivial. Also, because the users are admins, I can't really ensure that tools I install will not be tampered with.

So my idea is to treat each server as a black box and use stats from outside of the server to decide if there's activity:

  • Monitor all servers for disk and CPU activity.
  • If disk writes and CPU activity fall to zero for 1 hr then assume system is idle and shut it down.

I don't care to turn off more machines than are truely idle, if I have something like 90% accuracy. Would the above black box work or be unreliable? What black box metrics would be more appropriate?