views:

1814

answers:

3

In python, I'm using httplib because it "keep-alive" the http connection (as oppose to urllib(2)). Now, I want to use cookielib with httplib but they seem to hate each other!! (no way to interface them together).

Does anyone know of a solution to that problem?

+2  A: 

HTTP handler for urllib2 that supports keep-alive

jitter
link update: http://urlgrabber.baseurl.org/help/urlgrabber.keepalive.html
mykhal
+1  A: 

HACK ALERT! :)

I'd go other suggested way, but I've done a hack (done for different reasons though), which does create an interface between httplib and cookielib.

What I did was creating a fake HTTPRequest with minimal required set of methods, so that CookieJar would recognize it and process cookies as needed. I've used that fake request object, setting all the data needed for cookielib.

Here is the code of the class:

class HTTPRequest( object ):
"""
Data container for HTTP request (used for cookie processing).
"""

    def __init__( self, host, url, headers={} ):
        self._host = host
        self._url = url
        self._headers = {}
        for key, value in headers.items():
            self.add_header(key, value)

    def has_header( self, name ):
        return name in self._headers

    def add_header( self, key, val ):
        self._headers[key.capitalize()] = val

    def add_unredirected_header(self, key, val):
        self._headers[key.capitalize()] = val

    def is_unverifiable( self ):
        return True

    def get_type( self ):
        # TODO: implement other protocols support
        return 'https'

    def get_full_url( self ):
        # TODO: implement other protocols support
        return 'https://' + self._host[0] + ":" + str(self._host[1]) + self._url

    def get_header( self, header_name, default=None ):
        return self._headers.get( header_name, default )

    def get_host( self ):
        return self._host[0]

    get_origin_req_host = get_host

    def get_headers( self ):
        return self._headers

Please note, the class has support for HTTPS protocol only (all I needed at the moment).

The code, which used this class was (please note another hack to make response compatible with cookielib):

cookies = CookieJar()

headers = {
    # headers needed
}

# construct fake request
request = HTTPRequest( host, request_url, headers )

# add cookies to fake request
cookies.add_cookie_header(request)

# issue an HTTP request using cookies and headers from fake request
connection.request(type, request_url, body, request.get_headers())

response = server_connection.getresponse()

if response.status == httplib.OK:
    # HACK: pretend we're urllib2 response
    response.info = lambda : response.msg

    # read and store cookies from response
    cookies.extract_cookies(response, request)

    # process response...
Serge Broslavsky
This hack just saved me a couple hours of reimplementing more or less the same thing myself. Thank you.
Zack
Welcome, Zack! That's the main reason for sharing. :)
Serge Broslavsky
A: 

To keep everything consistent, you may want to change all the references to capitalize() by lower() and has_header and get_headers key converted to lower as well

Seraj Ahmad