views:

209

answers:

2

Hi,

Here is the URL of the site I want to fetch

https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff%27s+tags

When I fetch the web site with the following code and display the contents with the following code:

sock = urllib.urlopen("https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff's+tags")
html = sock.read()
sock.close()
soup = BeautifulSoup(html)
print soup.prettify()

I get the following output:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
 <head>
  <title>
   Error message
  </title>
 </head>
 <body>
  <h2>
   Invalid input data
  </h2>
 </body>
</html>

I get the same result with urllib2 as well. Now interestingly, this URL works on only Shiretoko web browser v3.5.7. (when I say it works I mean that it brings me the right page). When I feed this URL into Firefox 3.0.15 or Konqueror v4.2.2. I get exactly the same error page (with "Invalid input data"). I don't have any idea what creates this difference and how I can fetch this page using Python. Any ideas?

Thanks

+2  A: 

If you see the urllib2 doc, it says

urllib2.build_opener([handler, ...])¶

    .....
    If the Python installation has SSL support (i.e., if the ssl module can be imported), HTTPSHandler will also be added. 

    .....

you can try using urllib2 together with ssl module. alternatively, you can use httplib

ghostdog74
A: 

That's exactly what you get when you click on the link with a webbrowser. Maybe you are supposed to be logged in or have a cookie set or something

I get the same message for firefox 3.5.8 (shiretoko) on linux

gnibbler