views:

1025

answers:

3

Hi,

I am using Python's urllib2 with Tor as a proxy to access a website. When I open the site's main page it works fine but when I try to view the login page (not actually log-in but just view it) I get the following error...

URLError: <urlopen error (10060, 'Operation timed out')>

To counteract this I did the following:

import socket
socket.setdefaulttimeout(None).

I still get the same timeout error.

  1. Does this mean the website is timing out on the server side? (I don't know much about http processes so sorry if this is a dumb question)
  2. Is there any way I can correct it so that Python is able to view the page?

Thanks, Rob

A: 

I don't know enough about Tor to be sure, but the timeout may not happen on the server side, but on one of the Tor nodes somewhere between you and the server. In that case there is nothing you can do other than to retry the connection.

knabar
+2  A: 

According to the Python Socket Documentation the default is no timeout so specifying a value of "None" is redundant.

There are a number of possible reasons that your connection is dropping. One could be that your user-agent is "Python-urllib" which may very well be blocked. To change your user agent:

request = urllib2.Request('site.com/login')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux i686; it-IT; rv:1.9.0.2) Gecko/2008092313 Ubuntu/9.04 (jaunty) Firefox/3.5')

You may also want to try overriding the proxy settings before you try and open the url using something along the lines of:

proxy = urllib2.ProxyHandler({"http":"http://127.0.0.1:8118"})  
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
Andrew Austin
Thanks for the reply. I had already built and installed the proxy handler but I hadn't thought of trying to change the user-agent. I tried but it didn't change anything. I also don't think it is being explicitly blocked because I am able to access the main page. Is it possible that the site has a set default timeout that is small and the additional time taken by the proxy is causing a timeout?
I think what you suggest may be possible, but unlikely. Another option to consider is that the site is performing a reverse lookup, detecting that you are coming from a known proxy, and subsequently dropping your connection. I had this problem, or rather, that was my best guess when I encountered a similar issue with Yahoo/Yahoo Mail. Have you tried other domains without issue?
Andrew Austin
I have tried other domains without issue including other https domains but I am able to access a lot of pages on the site, just not the login. Is this consistent with a site that is performing a reverse lookup? Thanks, Rob
Okay so I tested it further and I am able to use the proxy to access https on multiple different sites, just not this one specific site. Furthermore I can access any http pages on this specific site but anytime I try to access any https page I receive the timeout error. Also if I try to access a non-existent https page on the site I am trying to access (e.g. https://www.domain-i-am-testing.com/sdljkfsjlkdflksdadj, I also get the timeout error)
And lastly when I open the login page through a popular online proxy I am able to access it which makes me think they probably aren't doing a reverse lookup.
A: 

urllib2.urlopen(url[, data][, timeout])

The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS, FTP and FTPS connections.

http://docs.python.org/library/urllib2.html

Unknown