views:

376

answers:

3

Hello everyone,

I would like to read a website asynchronously, which isnt possible with urllib as far as I know. Now I tried reading with with plain sockets, but HTTP is giving me hell. I run into all kind of funky encodings, for example transfer-encoding: chunked, have to parse all that stuff manually, and I feel like coding C, not python at the moment.

Isnt there a nicer way like URLLib, asynchronously? I dont really feel like re-implementing the whole HTTP specification, when it's all been done before.

Twisted isnt an option currently.

Greetings,

Tom

+2  A: 

You can implement an asynchronous call yourself. For each call, start a new thread (or try to get one from a pool) and use a callback to process it.

You can do this very nicely with a decorator:

def threaded(callback=lambda *args, **kwargs: None, daemonic=False):
    """Decorate  a function to run in its own thread and report the result
    by calling callback with it."""
    def innerDecorator(func):
        def inner(*args, **kwargs):
            target = lambda: callback(func(*args, **kwargs))
            t = threading.Thread(target=target)
            t.setDaemon(daemonic)
            t.start()
        return inner
    return innerDecorator

@threaded()
def get_webpage(url):
    data = urllib.urlopen(url).read()
    print data
bayer
Sorry, as I said, I want asynchronous sockets, not threads.
Tom
+2  A: 

Have you looked at http://asynchttp.sourceforge.net/?

"Asynchronous HTTP Client for Python

The 'asynchttp'' module is a logical extension of the Python library 'asynchat' module which is built on the 'asyncore' and 'select' modules. Our goal is to provide the functionality of the excellent 'httplib' module without using blocking sockets."

The project's last commit was 2001-05-29, so it looks dead. But it might be of interest anyway.

Disclaimer: I have not used it myself.

Also, this blog post has some information on async HTTP.

codeape
+1  A: 

Hi,

The furthest I came was using modified asynchttp, that codeape suggested. I have tried to use both asyncore/asynchat and asynchttp, with lots of pain. It took me far too long to try to fix all the bugs in it (there's a method handle_read, nearly copied from asyncore, only badly indented and was giving me headaches with chunked encoding). Also, asyncore and asynchat are best not used according to some hints I got on google.

I have settled with twisted, but that's obviously out of the question for you.

It might also depend what are you trying to do with your application and why you want async requests, if threads are an option or not, if you're doing GUI programming or something else so if you could shed some more inforation, that's always good. If not, I'd vote for threaded version suggested above, it offers much more readability and maintainability.