I am using python lxml library to parse html pages:
import lxml.html
# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')
Is there any way to set timeout for parsing?
I am using python lxml library to parse html pages:
import lxml.html
# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')
Is there any way to set timeout for parsing?
It looks to be using urllib.urlopen
as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler.
import socket
timeout = 10
socket.setdefaulttimeout(timeout)
Of course this is a quick-and-dirty solution.