Hi folks,
I need to access a few HTML pages through a Python script, problem is that I need COOKIE functionality, therefore a simple urllib HTTP request won't work.
Any ideas?
Hi folks,
I need to access a few HTML pages through a Python script, problem is that I need COOKIE functionality, therefore a simple urllib HTTP request won't work.
Any ideas?
Here's something that does cookies, and as a bonus does authentication for a site that requires a username and password.
import urllib2
import cookielib
import string
def cook():
url="http://wherever"
cj = cookielib.LWPCookieJar()
authinfo = urllib2.HTTPBasicAuthHandler()
realm="realmName"
username="userName"
password="passWord"
host="www.wherever.com"
authinfo.add_password(realm, host, username, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
urllib2.install_opener(opener)
# Create request object
txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
try:
req = urllib2.Request(url, None, txheaders)
cj.add_cookie_header(req)
f = urllib2.urlopen(req)
except IOError, e:
print "Failed to open", url
if hasattr(e, 'code'):
print "Error code:", e.code
else:
print f
print f.read()
print f.info()
f.close()
print 'Cookies:'
for index, cookie in enumerate(cj):
print index, " : ", cookie
cj.save("cookies.lwp")
The cookielib module provides cookie handling for HTTP clients.
The cookielib module defines classes for automatic handling of HTTP cookies. It is useful for accessing web sites that require small pieces of data – cookies – to be set on the client machine by an HTTP response from a web server, and then returned to the server in later HTTP requests.
The examples in the doc show how to process cookies in conjunction with urllib
:
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
check out Mechanize. "Stateful programmatic web browsing in Python".
It handles cookies automagically.