views:

548

answers:

1

I am trying to write a function to post form data and save returned cookie info in a file so that the next time the page is visited, the cookie information is sent to the server (i.e. normal browser behavior).

I wrote this relatively easily in C++ using curlib, but have spent almost an entire day trying to write this in Python, using urllib2 - and still no success.

This is what I have so far:

import urllib, urllib2
import logging

# the path and filename to save your cookies in
COOKIEFILE = 'cookies.lwp'

cj = None
ClientCookie = None
cookielib = None


logger = logging.getLogger(__name__)

# Let's see if cookielib is available
try:
    import cookielib
except ImportError:
    logger.debug('importing cookielib failed. Trying ClientCookie')
    try:
        import ClientCookie
    except ImportError:
        logger.debug('ClientCookie isn\'t available either')
        urlopen = urllib2.urlopen
        Request = urllib2.Request
    else:
        logger.debug('imported ClientCookie succesfully')
        urlopen = ClientCookie.urlopen
        Request = ClientCookie.Request
        cj = ClientCookie.LWPCookieJar()

else:
    logger.debug('Successfully imported cookielib')
    urlopen = urllib2.urlopen
    Request = urllib2.Request

    # This is a subclass of FileCookieJar
    # that has useful load and save methods
    cj = cookielib.LWPCookieJar()


login_params = {'name': 'anon', 'password': 'pass' }

def login(theurl, login_params):
  init_cookies();

  data = urllib.urlencode(login_params)
  txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}

  try:
    # create a request object
    req = Request(theurl, data, txheaders)

    # and open it to return a handle on the url
    handle = urlopen(req)

  except IOError, e:
    log.debug('Failed to open "%s".' % theurl)
    if hasattr(e, 'code'):
      log.debug('Failed with error code - %s.' % e.code)
    elif hasattr(e, 'reason'):
      log.debug("The error object has the following 'reason' attribute :"+e.reason)
      sys.exit()

  else:

    if cj is None:
      log.debug('We don\'t have a cookie library available - sorry.')
    else:
      print 'These are the cookies we have received so far :'
      for index, cookie in enumerate(cj):
        print index, '  :  ', cookie

      # save the cookies again  
      cj.save(COOKIEFILE) 

      #return the data
      return handle.read()



# FIXME: I need to fix this so that it takes into account any cookie data we may have stored
  def get_page(*args, **query):
    if len(args) != 1:
        raise ValueError(
            "post_page() takes exactly 1 argument (%d given)" % len(args)
        )
    url = args[0]
    query = urllib.urlencode(list(query.iteritems()))
    if not url.endswith('/') and query:
        url += '/'
    if query:
        url += "?" + query
    resource = urllib.urlopen(url)
    logger.debug('GET url "%s" => "%s", code %d' % (url,
                                                    resource.url,
                                                    resource.code))
    return resource.read() 

When I attempt to log in, I pass the correct username and pwd,. yet the login fails, and no cookie data is saved.

My two questions are:

  • can anyone see whats wrong with the login() function, and how may I fix it?
  • how may I modify the get_page() function to make use of any cookie info I have saved ?
+5  A: 

There are quite a few problems with the code that you've posted. Typically you'll want to build a custom opener which can handle redirects, https, etc. otherwise you'll run into trouble. As far as the cookies themselves so, you need to call the load and save methods on your cookiejar, and use one of subclasses, such as MozillaCookieJar or LWPCookieJar.

Here's a class I wrote to login to Facebook, back when I was playing silly web games. I just modified it to use a file based cookiejar, rather than an in-memory one.

import cookielib
import os
import urllib
import urllib2

# set these to whatever your fb account is
fb_username = "[email protected]"
fb_password = "secretpassword"

cookie_filename = "facebook.cookies"

class WebGamePlayer(object):

    def __init__(self, login, password):
        """ Start up... """
        self.login = login
        self.password = password

        self.cj = cookielib.MozillaCookieJar(cookie_filename)
        if os.access(cookie_filename, os.F_OK):
            self.cj.load()
        self.opener = urllib2.build_opener(
            urllib2.HTTPRedirectHandler(),
            urllib2.HTTPHandler(debuglevel=0),
            urllib2.HTTPSHandler(debuglevel=0),
            urllib2.HTTPCookieProcessor(self.cj)
        )
        self.opener.addheaders = [
            ('User-agent', ('Mozilla/4.0 (compatible; MSIE 6.0; '
                           'Windows NT 5.2; .NET CLR 1.1.4322)'))
        ]

        # need this twice - once to set cookies, once to log in...
        self.loginToFacebook()
        self.loginToFacebook()

        self.cj.save()

    def loginToFacebook(self):
        """
        Handle login. This should populate our cookie jar.
        """
        login_data = urllib.urlencode({
            'email' : self.login,
            'pass' : self.password,
        })
        response = self.opener.open("https://login.facebook.com/login.php", login_data)
        return ''.join(response.readlines())

test = WebGamePlayer(fb_username, fb_password)

After you've set your username and password, you should see a file, facebook.cookies, with your cookies in it. In practice you'll probably want to modify it to check whether you have an active cookie and use that, then log in again if access is denied.

Anthony Briggs
@anthony: +1 for the code snippet. Your code is so much neater and cleaner than mine (well I'm just starting to learn to be a Pythonista! ;) I have read an reread your post - there are two things that are not clear to me. 1). I don't understand why you having to call loginToFacebook() twice. It looks like the cookie will be set each time the loginToFacebook() method id invoked. Could you please clarify?. 2). can you give guidelines on how to check if an ACTIVE cookie exists?
morpheous
With cookie-based logins, the server first assigns you a cookie, *then* you log in. If you try removing one of the logins, you'll find that you aren't logged in - FB has checked your response, seen that you don't have a cookie and redirected you back to the login page. A clearer way would be to replace the first call with one to get the FB front page like `def getFBCookie(self): self.opener.open('https://www.facebook.com/')`which would do the same thing.And yes, it is pretty neat code. That does take a bit more time up front, but pays off when you need to read it later, or reuse it :)
Anthony Briggs