views:

45

answers:

1

I've been scouring the Internet looking for a solution to my problem with Python. I'm trying to use a urllib2 connection to read a potentially endless stream of data from an HTTP server. It's part of some interactive communication, so it's important that I can get the data that's available, even if it's not a whole buffer full. There seems to be no way to have read \ readline return the available data. It will block forever waiting for the entire (endless) stream before it returns.

Even if I set the underlying file descriptor to non-blocking using fnctl, the urllib2 file-object still blocks!! In general there seems to be no way to make python file-objects, upon read, return all available data if there is some and block otherwise.

I've seen a few posts about people seeking help with this, but I have seen no solutions. What gives? Am I missing something? This seems like such a normal use-case to completely ruin! I'm hoping to utilize urllib2's ability to detect configured proxies and use chunked encoding, but I can't if it won't cooperate.

Edit: Upon request, here is some example code

Client:

connection = urllib2.urlopen(commandpath)
id = connection.readline()

Now suppose that the server is using chunked transfer encoding, and writes one chunk down the stream and the chunk contains the line, and then waits. The connection is still open, but the client has data waiting in a buffer.

I cannot get read or readline to return the data I know it has waiting for it, because it tries to read until the end of the connection. In this case the connection may never close so it will wait either forever or until an inactivity timeout occurs, severing the connection. Once the connection is severed it will return, but that's obviously not the behavior I want.

+1  A: 

urllib2 operates at the HTTP level, which works with complete documents. I don't think there's a way around that without hacking into the urllib2 source code.

What you can do is use plain sockets (you'll have to talk HTTP yourself in this case), and call sock.recv(maxbytes) which does read only available data.

Update: you may want to try to call conn.fp._sock.recv(maxbytes), instead of conn.read(bytes) on an urllib2 connection.

Wim
The point of using the urllib2 connection is that urllib2 supports environmental proxies and chunked encoding already, something which I'm not too excited about implementing myself. I feel like if I could just kick something in the pants at the lowest level everything would work...
jdizzle
Right, I wouldn't want to start implementing all of those myself either. Did the `conn.fp._sock.recv(maxbytes)` trick do you any good?
Wim
I actually did end up using conn.fp._sock.fp._sock or something crazy like that. I had to implement a chunked decoder, but that's not actually that difficult. It was not having to deal with the proxy issue that really scared me.
jdizzle