tags:

views:

64

answers:

1

Is there a way to limit amount of data downloaded by python's urllib2 module ? Sometimes I encounter with broken sites with sort of /dev/random as a page and it turns out that they use up all memory on a server.

+2  A: 

urllib2.urlopen returns a file-like object, and you can (at least in theory) .read(N) from such an object to limit the amount of data returned to N bytes at most.

This approach is not entirely fool-proof, because an actively-hostile site may go to quite some lengths to fool a reasonably trusty received, like urllib2's default opener; in this case, you'll need to implement and install your own opener that knows how to guard itself against such attacks (for example, getting no more than a MB at a time from the open socket, etc, etc).

Alex Martelli