tags:

views:

10

answers:

0

Hi.

The following url (and others like it) can be opened in a browser but causes urllib2.urlopen to throw a 404 exception: http://store.ovi.com/#/applications?categoryId=20&fragment=1&page=1

geturl() returns the same url (no redirect). I copied and pasted the request headers from firebug. I tried using add_header and got the same result. wget opens the url in the console but not from the script.

the code:

source_url = 'http://store.ovi.com/#/applications?categoryId=20&fragment=1&page=1'
try:

    socket.setdefaulttimeout(10)
    hdrs = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13 AppEngine-Google;(+http://code.google.com/appengine)','Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Cookie':'JNPRSESSID=4u4devdrt7eb6e0qem3gin47i2; s_cc=true; undefined_s=First%20Visit; s_nr=1282817443274; s_sq=%5B%5BB%5D%5D; view=Grid; menu=menuOpen; OVI_DEVICE=b5130'}
    ree = urllib2.Request(source_url,headers=hdrs)
    resp = urllib2.urlopen(ree)
    htmlSource = resp.read()
    return htmlSource

except urllib2.HTTPError, e:

    print e.code
    print e.msg
    print e.headers
    print e.fp.read()

The error output:
404
Not Found
Date: Thu, 26 Aug 2010 10:12:52 GMT
Server: Apache/2.2.3 (Red Hat)
X-Powered-By: PHP/5.2.2
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Keep-Alive: timeout=7, max=333
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

Am I doing something wrong? If not, is there a work around? any help would be appreciated. Thanks!