views:

7011

answers:

3

Hi!

I want to download and parse webpage using python, but to access it I need a couple of cookies set. Therefore I need to login over https to the webpage first. The login moment involves sending two POST params (username, password) to /login.php. During the login request I want to retrieve the cookies from the response header and store them so I can use them in the request to download the webpage /data.php.

How would I do this in python (preferably 2.6)? If possible I only want to use builtin modules.

+22  A: 
import urllib, urllib2, cookielib

username = 'myuser'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp

resp is the straight html of the page you want to open, and you can use opener to view any page using your session cookie.

Harley
Awesome answer.
Paolo Bergantino
+1  A: 

Almost, but resp will just print the file descriptor try print resp.readLines()

A: 

@:Harley

i'm stating in python... and it's very usefull thanks a lot!!!

@Amit

I think you want say : resp.read()