views:

742

answers:

1

I've written my first Python application with the App Engine APIs, it is intended to monitor a list of servers and notify me when one of them goes down, by sending a message to my iPhone using Prowl, or sending me an email, or both.

Problem is, a few times a week it notifies me a server is down even when it clearly isn't. I've tested it with servers i know should be up virtually all the time like google.com or amazon.com but i get notifications with them too.

I've got a copy of the code running at http://aeservmon.appspot.com, you can see that google.com was added Jan 3rd but is only listed as being up for 6 days.

Below is the relevant section of the code from checkservers.py that does the checking with urlfetch, i assumed that the DownloadError exception would only be raised when the server couldn't be contacted, but perhaps I'm wrong.

What am I missing?

Full source on github under mrsteveman1/aeservmon (i can only post one link as a new user, sorry!)

def testserver(self,server):
     if server.ssl:
          prefix = "https://"
     else:
          prefix = "http://"
     try:
          url = prefix + "%s" % server.serverdomain
          result = urlfetch.fetch(url, headers = {'Cache-Control' : 'max-age=30'} )
     except DownloadError:
          logging.info('%s could not be reached' % server.serverdomain)
          self.serverisdown(server,000)
          return
     if result.status_code == 500:
          logging.info('%s returned 500' % server.serverdomain)
          self.serverisdown(server,result.status_code)
     else:
          logging.info('%s is up, status code %s' % (server.serverdomain,result.status_code))
          self.serverisup(server,result.status_code)

UPDATE Jan 21:

Today I found one of the exceptions in the logs:

ApplicationError: 5 
Traceback (most recent call last):
  File "/base/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 507, in __call__
    handler.get(*groups)
  File "/base/data/home/apps/aeservmon/1.339312180538855414/checkservers.py", line 149, in get
    self.testserver(server)
  File "/base/data/home/apps/aeservmon/1.339312180538855414/checkservers.py", line 106, in testserver
    result = urlfetch.fetch(url, headers = {'Cache-Control' : 'max-age=30'} )
  File "/base/python_lib/versions/1/google/appengine/api/urlfetch.py", line 241, in fetch
    return rpc.get_result()
  File "/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 501, in get_result
    return self.__get_result_hook(self)
  File "/base/python_lib/versions/1/google/appengine/api/urlfetch.py", line 331, in _get_fetch_result
    raise DownloadError(str(err))
DownloadError: ApplicationError: 5 
A: 

other folks have been reporting issues with the fetch service (e.g. http://code.google.com/p/googleappengine/issues/detail?id=1902&q=urlfetch&colspec=ID%20Type%20Status%20Priority%20Stars%20Owner%20Summary%20Log%20Component)

can you print the exception, it may have more detail, e.g.:

"DownloadError: ApplicationError: 2 something bad"
jspcal
The most recent "false positive" was at 7:04 EST just under an hour ago, and there are only logging messages i've triggered with logging.info() in the appengine log. I could try letting the exception go unhandled but i believe i tried that a few weeks ago and only saw the DownloadError exception message with little else. I'll try that again though right now, hopefully the exception triggers again tonight.
mrsteveman1
Updated the question with one of the exceptions that occurred today after i removed the exception handling. I had not seen ApplicationError: 5 before, apparently it means the request did not return within the limit for urlfetch?
mrsteveman1
yep exactly right... the fetch service may have occasional latency (assuming the target server is not the issue). you can set the `deadline` parameter to something like 60 seconds in `fetch()`
jspcal
It appears the max for deadline is 10 seconds, which i'll try but i may have to look into doing async requests or use a separate library for the requests like httplib.Thanks for the help! Shall i mark this answer correct?
mrsteveman1