views:

34

answers:

1

If I enter this URL in a browser it returns to me the valid XML data that I am interested in scraping.

http://www.facebook.com/ajax/stream/profile.php?__a=1&profile_id=36343869811&filter=2&max_time=0&try_scroll_load=false&_log_clicktype=Filter%20Stories%20or%20Pagination&ajax_log=0

However, if I do it from the server-side, it doesn't work as it previously did. Now it just returns this error, which seems to be the default error message

{u'silentError': 0, u'errorDescription': u"Something went wrong. We're working on getting it fixed as soon as we can.", u'errorSummary': u'Oops', u'errorIsWarning': False, u'error': 1357010, u'payload': None}

here is the code in question, I've tried multiple User Agents, to no avail:

import urllib2
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'
uaheader = { 'User-Agent' : user_agent }
wallurl='http://www.facebook.com/ajax/stream/profile.php?__a=1&profile_id=36343869811&filter=2&max_time=0&try_scroll_load=false&_log_clicktype=Filter%20Stories%20or%20Pagination&ajax_log=0'

req = urllib2.Request(wallurl, headers=uaheader)
resp  = urllib2.urlopen(req)        
pageData=convertTextToUnicode(resp.read())
print pageData #and get that error

What would be the difference between the server calls and my own browser aside from User Agents and IP addresses?

+2  A: 

I tried the above url in both chrome and firefox. It works on chrome but fails on firefox. On chrome, I am signed into facebook while on Firefox, I am not.

This could be the reason for this discrepancy. You will need to provide authentication in your urllib2 based script that you have posted.

There is a existing question on authentication with urllib2.

pyfunc