views:

503

answers:

3

If I point Firefox at http://bitbucket.org/tortoisehg/stable/wiki/Home/ReleaseNotes, I get a page of HTML. But if I try this in Python:

import urllib

site = 'http://bitbucket.org/tortoisehg/stable/wiki/Home/ReleaseNotes'
req = urllib.urlopen(site)
text = req.read()

I get the following:

500 Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request.

What am I doing wrong?

A: 

I don't think you're doing anything wrong -- it looks like this server was just down? Your script worked fine for me ('text' contained the same data as that displayed in the browser).

JP Lodine
Funny, I can reproduce the OP's problem exactly: from any browser I see a rich page, from urllib a 587-bytes HTML result which DOES say `<title>500 Internal Server Error</title>` etc etc.
Alex Martelli
+3  A: 

You're doing nothing wrong, on the surface, and as the error page says you should contact the site's administrators because they're the ones with the server logs which may explain what's happening. Fortunately, bitbucket's site admins are a friendly bunch!

No doubt there is some header or combination of headers that browsers set one way, urllib sets another way, and a bug on the server gets tickled in the latter case. You may want to see exactly what headers are being sent e.g. with firebug in firefox, and reproduce those until you isolate exactly the server bug; most likely it's going to be the user agent or some "accept"-ish header that's tickling that bug.

Alex Martelli
+3  A: 

You are not doing anything wrong, bitbucket does some user agent detection (to detect mercurial clients for example). Just changing the user agent fixes it (if it doesn't have urllib as a substring).

You should fill an issue regarding this: http://bitbucket.org/jespern/bitbucket/issues/new/

tonfa
Changing the user agent fixed it, like you said. TVM.
Charles Anderson