ansaurus

Question

Python's urllib2 don't work on some sites

Answer 1

A:

I'm the guy who posted the question. I have some suspicions - but not sure about them - that's why I posted the question here.

What is the cause of this issue?

I think its due to the host blocking the urllib library using robot.txt or htaccess. But not sure about it. Not even sure if its possible.

Any workaround for this issue?

If you are in Unix, this will work...

contents = commands.getoutput("curl -s '"+url+"'")

Binny V A 2010-04-03 18:48:46

Answer 2

+3 A:

I believe it gets blocked by the User-Agent. You can change User-Agent using the following sample code:

USERAGENT = 'something'
HEADERS = {'User-Agent': USERAGENT}

req = urllib2.Request(URL_HERE, headers=HEADERS)
f = urllib2.urlopen(req)
s = f.read()
f.close()

livibetter 2010-04-03 18:55:57

These cluelessly-run sites seem intent on forcing everyone to use a generic UA, ultimately breaking the header for everyone.

Glenn Maynard 2010-04-03 19:00:59

Answer 3

+2 A:

Try setting a different user agent. http://stackoverflow.com/questions/802134/changing-user-agent-on-urllib2-urlopen

z33m 2010-04-03 18:56:09

ansaurus

tags:

views:

answers:

Python's urllib2 don't work on some sites

What is the cause of this issue?

Any workaround for this issue?

related questions