I need to be able to tell if a series of servers have anyone active on them, and if not then to automatically shut them down (turn off the VM). It's not a trivial task, because I have 1000+ server instances that include an assortment of OSes (Win, Unix, Linux) and many different type of configurations. This makes installing an uptime agent on the boxes non-trivial. Also, because the users are admins, I can't really ensure that tools I install will not be tampered with.
So my idea is to treat each server as a black box and use stats from outside of the server to decide if there's activity:
- Monitor all servers for disk and CPU activity.
- If disk writes and CPU activity fall to zero for 1 hr then assume system is idle and shut it down.
I don't care to turn off more machines than are truely idle, if I have something like 90% accuracy. Would the above black box work or be unreliable? What black box metrics would be more appropriate?