views:

184

answers:

3

I am writing client-side Python unit tests to verify whether the HTTP 302 redirects on my Google App Engine site are pointing to the right pages. So far, I have been calling urllib2.urlopen(my_url).geturl(). However, I have encountered 2 issues:

  1. the URL returned by geturl() does not appear to include URL query strings like ?k1=v1&k2=v2; how can I see these? (I need to check whether I correctly passed along the visitor's original URL query string to the redirect page.)
  2. geturl() shows the final URL after any additional redirects. I just care about the first redirect (the one from my site); I am agnostic to anything after that. For example, let's assume my site is example.com. If a user requests http://www.example.com/somepath/?q=foo, I might want to redirect them to http://www.anothersite.com?q=foo. That other site might do another redirect to http://subdomain.anothersite.com?q=foo, which I can't control or predict. How can I make sure my redirect is correct?
+4  A: 

Use httplib (and look at the return status and Location header of the response) to avoid the "auto-follow redirects" that's impeding your testing. There's a good example here.

Alex Martelli
A: 

I would do this "old-school". Use urlopen to get the source code of the page that you visit first. Store the URL of this page in a global variable.

Have a Boolean variable that is true if the current page redirects you to another page, false otherwise. Initialize it to true.

Enter a while loop while redirect

  • Then screen-scrape the source code to find "redirect". You can do this either by string manipulation or by using a regex. If you find "redirect" on the page, set redirect to true.

  • Search past the string index of "redirect" in the source code to find the url that you are being redirected to. Again, you can do this by basic string manipulation or with a regex.

  • reassign url to be this new url that you have found

  • if you do not find "redirect" in the source code of the page that you are on, set redirect to be false. This will make you exit the while loop and url will contain the last url in the chain of redirects

I'm currently working on an assignment, else I would have provided actual code. Hope this helps.

inspectorG4dget
+3  A: 

Supply follow_redirects=False to the fetch function, then retrieve the location of the first redirect from the 'location' header in the response, like so:

response = urlfetch.fetch(your_url, follow_redirects=False)
location = response.headers['Location']
Nick Johnson
Thanks Nick. Although Alex's response was very good, this is precisely what I was looking for.
RexE