tags:

views:

64

answers:

1

How would you go about downloading a webpage file behind an HTTPS login via a language such as python? More specifically I am talking about the page behind the login from http://www.cnbtn.com.

+2  A: 

https will not matter. HTTPS just says that the data going over the wire is securely encrypted. Rather you need to learn more about how the login actually works. For instance is it Basic Auth (where a popup shows up for user/pass)? you can then make a request like https://user:[email protected]/my_file.gif

More likely it is some other authentication, that could do basically anything. You'll need to reverse engineer it to figure out what you need to do to get in. But most likely you'll need to use an HTTPS client library that maintains state (like it keeps cookies, etc).

Good luck

mlathe
RE reverse engineering: Try initiating the download manually while capturing the request that your browser sends to the server (using Fiddler or Wireshark). Then send a similar request, customizing it to your needs. I've successfully used this in the past to build Perl scripts to automate downloads from sites that were designed to defeat download managers.
Nate C-K
It's not a basic auth like that, it appears to post to a asp
allen
using something like the "Live HTTP Headers" Firefox Add-on will definately help understand what your browser is doing to log in. I was solving this problem to get Nutch to crawl a site. In order to crawl the authenticated pages, i needed to do several things, including remembering cookies, and following some redirect request.
mlathe