views:

44

answers:

1

I usually deal with at least two dozen servers on a weekly basis. I thought it would be a good idea to create a script that queries each of them and determines whether they're "up" or not.

  • The servers run either Apache2 or IIS7.
  • The hosting providers vary
  • There are usually multiple sites on each server
  • The setups are inconsistent, there isn't always a default apache "hello world" page when you access the ip directly.
  • Sites are all Virtualhosts

I was thinking, would the best way to determine if they're up be just taking one site from each server and making an http HEAD request to make sure the response from the server is 200 OK? Obviously this would be prone to a "false positive" if:

  1. The site configuration/setup improperly returns a 200 OK when it should return a 4xx error code
  2. If an individual site ( <VirtualHost> )'s configuration is disabled, or if the site has moved to a different server.

But for the most part, a HEAD request and relying on 200 OK should be reliable, right? As well as making sure the domain's A record matches what it's listed as incase of site moves.

Pseudo code:

import http

list = {
  '72.0.0.0' : 'domain.com',
  '71.0.0.0' : 'blah.com',
}

serverNames = {
  'jetty' : '72.0.0.0',
  'bob'   : '71.0.0.0'
}

for each ( list as server => domain ) { 
    headRequest = http.head( domain )
    if ( headRequest.response == 200 && http.arecord(domain) == server ) { 
       print serverNames[server] + ' is up ';
    } else {
       print 'Server is either down or the site attached to the server lookup has moved.';
    }
}

I'll probably write the script in Python or PHP, but this question is to discuss the logic only.

+3  A: 

Logic seems sound to me. I would suggest extending it to use the mod_status module on your apache servers. After doing your head check, also try to grab /server-status and use that to double check that your server is healthy. I'm thinking something like this:

statusRequest = http.get('/server-status')
if(statusRequest == 200) {
 // Make sure you don't have insane load averages, etc
}
else { /* Check something IIS specific, or just be happy the head check worked */ }

Another technique I've used in the past is to also do a check for something that I know should give a 404. That way you have a better chance of knowing if the server is just handing out 200's to anyone that asks. (I saw it once due to a bad config)

speshak