tags:

views:

62

answers:

1

I am using python lxml library to parse html pages:

import lxml.html

# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')

Is there any way to set timeout for parsing?

A: 

It looks to be using urllib.urlopen as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler.

import socket
timeout = 10
socket.setdefaulttimeout(timeout)

Of course this is a quick-and-dirty solution.

jathanism