This might be more of a serverfault.com question but a) it doesn't exist yet and b) I need more rep for when it does :~)
My employer has a few hundred servers (all *NIX) spread across several locations. As I suspect is common we don't really know how many servers we have: more than once I've been surprised to find a server that's been up for 5 years, apparently doing nothing but elevating the earth's temperature slightly. We have a number of databases that store bits of server information -- Puppet, Cobbler, Nagios, Cacti, our load balancers, DNS, various internal spreadsheets and so on but it's all very disparate, incomplete and overlapping. Maintaining this mess costs time and money.
So, I'd like to come up a single database which holds details of what each server is (hardware specs, role, etc) and replaces (or at least supplies data for) the databases mentioned above. The database and web interface are likely to be a Rails app as this is what I have most experience with. I'm more of a sysadmin than a coder.
Has this problem already been solved? I can't find any open source software that really fits the bill and I'm generally not too keen on bloaty, GUI vendor-supplied solutions.
How should I implement the device information collection bit? For instance, it'd be great to the database update device records when disks are added or removed, or when the server serial number changes because HP replace the board. This information comes from many different sources: dmidecode
, command-line disk tools, SNMP against the server or its onboard lights-out card, and so on. I could expose all this through custom scripts and net-snmp
, or I could run a local poller that reported the information back to the central DB (maybe via a RESTful interface or something). It must be easily extensible.
Have you done this? How? Tell me your experiences, discoveries, mistakes and recommendations!