ansaurus

Question

Answer 1

A:

I know you're asking about pycurl, but I find it too hard and unpythonic to use. The API is weird.

Here's a twisted example:

from twisted.web.client import Agent
from twisted.internet import reactor, defer

def get_headers(response, url):
    '''Extract a dict of headers from the response'''
    return url, dict(response.headers.getAllRawHeaders())

def got_everything(all_headers):
    '''print results and end program'''
    print dict(all_headers)
    reactor.stop()

agent = Agent(reactor)
urls = (line.strip() for line in open('urls.txt'))
reqs = [agent.request('HEAD', url).addCallback(get_headers, url) for url in urls if url]
defer.gatherResults(reqs).addCallback(got_everything)
reactor.run()

This example starts all requests asynchronously, and gather all results. Here's the output for a file with 3 urls:

{'http://debian.org': {'Content-Type': ['text/html; charset=iso-8859-1'],
                       'Date': ['Thu, 04 Mar 2010 13:27:25 GMT'],
                       'Location': ['http://www.debian.org/'],
                       'Server': ['Apache'],
                       'Vary': ['Accept-Encoding']},
 'http://google.com': {'Cache-Control': ['public, max-age=2592000'],
                       'Content-Type': ['text/html; charset=UTF-8'],
                       'Date': ['Thu, 04 Mar 2010 13:27:25 GMT'],
                       'Expires': ['Sat, 03 Apr 2010 13:27:25 GMT'],
                       'Location': ['http://www.google.com/'],
                       'Server': ['gws'],
                       'X-Xss-Protection': ['0']},
 'http://stackoverflow.com': {'Cache-Control': ['private'],
                              'Content-Type': ['text/html; charset=utf-8'],
                              'Date': ['Thu, 04 Mar 2010 13:27:24 GMT'],
                              'Expires': ['Thu, 04 Mar 2010 13:27:25 GMT'],
                              'Server': ['Microsoft-IIS/7.5']}}

nosklo 2010-03-04 13:34:15

Answer 2

+1 A:

I would use Python's built in httplib and threading modules. I don't see need for a 3rd party module.

mikerobi 2010-03-04 16:51:04

Well, the same reason you'd use python and not C/ASM - to make things easier/nicer.

nosklo 2010-03-05 00:17:00

When you factor in the type spent on finding and learning how to use a 3rd party module, there isn't much of a saving for such a small problem.

mikerobi 2010-03-05 03:10:46

That's not an argument. You also have to learn built-in modules so the time spent is the same. And for some cases (like this one) you'd get lower quality code.

nosklo 2010-03-09 04:38:03

Answer 3

A:

The solution is to use a little bit of functional programming to 'stick' some additional information to our callback function.

functools.partial

Marko Kevac 2010-03-05 06:03:34

ansaurus

tags:

views:

answers:

pycurl and lot of callback functions

related questions