tags:

views:

67

answers:

2

My python application makes many http requests to many urls using urllib2. I would like to build a unit test suite to test my data parsing and error handling code.

I have a directory full of test data, with a number of files, each file containing a single http response, with headers and response data. (using curl -i) In some cases, these files contain http error messages (needed to test the error handling)

Ideally, I would like to create a mock object to replace urllib2.urlopen and return a mock response object.

I'm wondering if there is an easy way to have urllib2 load an HTTP response directly from a file and have urllib2 parse this data to create the appropriate response object (as if the response was read from a url.

I tried using url's constructed with "file://" protocol, however the http response headers at the top of the file were not read nor parsed properly.

Alternatively I am considering writing a small web server class to serve the test files, however this seems like a little more work than I'd like. It would be easier to have urllib2 somehow reconstruct the response object from the http responses I've already saved in files (without having to build a web server to serve them again)

Any ideas?

+1  A: 

I think the best approach is to mock a subset of httplib.HTTPConnection (call the resulting class mockcon for concreteness in the following) and add a handler using it and subclassing HTTPHandler (to use in build_opener -- the subclassing means it can replace HTTPHandler that build_opener uses by default):

class MockHTTPHandler(urllib2.HTTPHandler):

    def http_open(self, req):
        return self.do_open(mockcon, req)

The mockcon class must supply the methods do_open call -- several can be dummies (i.e. accept and ignore arbitrary args and kwds and do nothing):

set_debuglevel
_set_tunnel
request

(may be interested in the 2nd arg of request, as it gives the "selector" part of the URL).

The __init__ method of mockcon gets the host part of the URL as the first arg (i.e., first after self of course) and should ignore following kwds (used to set a timeout).

The get_response method of mockcon (no args, beyond of course self) must return an http response object -- i.e., a file-like readable object which also has attributes .msg, .status, and .reason, and a method get_full_url() to return the URL.

You could use an actual httplib.HTTPResponse instance for the latter role, but you must initialize it with one mock/dummy arg that has a makefile argument (ignores its args and kwds and returns whatever), and, right after initializing it, reset its .fp argument to be a rb open file giving exactly the bytes that a real HTTP response would receive on its socket.

I think that building a full-fledged mock for the whole urllib2.urlopen call might be simpler than this attempt to reuse most of the functionality of urllib2 (and httplib which it uses internally), though perhaps not quite as simple as the "local web server" approach which you appear to think is more work. But it's worth considering all the three approaches (the mock would surely be most-lightweight/fast in operation, the local web server slowest... and would also require somehow modifying the URLs by prefixing an http://localhost:someport/ to them, of course).

Alex Martelli
A: 

The server approach is definitely not more work, it's probably the easiest and least work of all your alternatives.

Check out: http://docs.python.org/library/simplehttpserver.html

A 7 line python program that when run from a certain directory will serve up all the files(and, recursively, any files in subdirectories) over HTTP.

You could probably have your unit test code start and stop the server so you don't need to leave it running even when not testing.

entropy