views:

72

answers:

2

I am using urllib2 and HTTPCookieProcessor to login to a website. I want to login to multiple accounts concurrently and store the cookies to be reused later.

Can you recommend an approach or library to achieve this?

+1  A: 

How to achieve this really depends on you needs: what kind of login is it? Digest authentication? Is it a web form? Is JavaScript involved (you're pretty much screwed if this is the case)? A library like mechanize can help you a lot with such stuff: handling of forms, redirection, authentication, cookies... However, you'd have to take care of concurrency yourself by spawning threads/processes.

Another approach that works beautifully for concurrency is using Twisted. With that solution however you'd have to handle redirection and cookies etc. yourself -- although you might be able to reuse parts of e.g. mechanize.

paprika
it's simple form based with cookies and I have it working for single login. Threading isn't the problem so much as handling multiple cookie sessions.
Plumo
Then this has nothing to do with concurrency (you should remove that tag). In this case you could simply instantiate a new cookiejar class for every account.
paprika
I will be accessing multiple accounts concurrently so it is related to concurrency. What I meant is threading is not the difficult part.
Plumo
+1  A: 

The OP clarified that this is not a concurrency issue. With sequential processing in mind, this is much simpler. I once used something like the following to update a bunch of SIP phone base stations (they had a web front-end which you could use to upload VCard files for the phone book). Note that I just cut away some crap and renamed this and that in this hacky script, I did not test it at all. Its sole purpose is to give the OP an idea on how he could deal with this.

#!/usr/bin/python
# -*- coding:utf-8 -*-

from optparse import OptionParser
import sys
from mechanize import Browser, CookieJar, Request, urlopen


accounts = [
    {'ipaddr': '127.0.0.1', 'user': 'joe', 'pass': 'foobar'},
    ]


class WebsiteAccount(object):

    def __init__(self, ipaddr, username, password, browser):
        self.ipaddr = ipaddr
        self.username = username
        self.password = password
        self.browser = browser
        self.cookiejar = CookieJar()
        self.browser.set_cookiejar(self.cookiejar)

    def login(self):
        self.browser.open('http://'+self.ipaddr+'/login.html')
        self.browser.select_form(name='loginform')
        self.browser.form.set_value(self.username, name='username')
        self.browser.form.set_value(self.password, name='password')
        resp = self.browser.submit()
        print 'Logging into account %s@%s ...' % (self.username, self.ipaddr),
        if resp.geturl().endswith('/login.html'):
            print 'FAILED!'
            sys.exit(1)
        print ' OK'

    def logout(self):
        print ('Logging out from account %s@%s...' % (self.username, self.ipaddr),
        self.browser.open('http://'+self.ipaddr+'/logout.html')
        self.browser.close()
        print 'OK'


def main():
    parser = OptionParser()
    parser.add_option('-d', '--debug', action='store_true', dest='debug', default=False)
    parser.add_option('-v', '--verbose', action='store_true', dest='verbose', default=False)
    (opts, args) = parser.parse_args()
    for account in accounts:
        browser = Browser()
        browser.set_handle_referer(True)
        browser.set_handle_redirect(True)
        browser.set_handle_robots(False)
        bs = WebsiteAccount(account['ipaddr'],
                            account['user'],
                            account['pass'],
                            browser)
        # DEBUG
        if opts.debug == True:
            browser.set_debug_redirects(True)
            browser.set_debug_responses(True)
            browser.set_debug_http(True)
        bs.login()
        try:
            # ... do some stuff
            # save cookies here?  
            pass
        finally:
            # you shouldn't use this if you are interested in the login cookies
            bs.logout()


if __name__=='__main__':
    main()
paprika
Note: using sys.exit like this is gross, use exceptions instead.
paprika
yeah I guess something like that with separate cookie jars is what I need to keep the sessions independant
Plumo