views:

122

answers:

2

Hey.

I created a new Pylons project, and would like to use Cassandra as my database server. I plan on using Pycassa to be able to use cassandra 0.7beta. Unfortunately, I don't know where to instantiate the connection to make it available in my application.

The goal would be to :

  • Create a pool when the application is launched
  • Get a connection from the pool for each request, and make it available to my controllers and libraries (in the context of the request). The best would be to get a connexion from the pool "lazily", i.e. only if needed
  • If a connexion has been used, release it when the request has been processed

Additionally, is there something important I should know about it ? When I see some comments like "Be careful when using a QueuePool with use_threadlocal=True, especially with retries enabled. Synchronization may be required to prevent the connection from changing while another thread is using it.", what does it mean exactly ?

Thanks.

-- Pierre

A: 

Okay. I worked a little, I learned a lot, and I found a possible answer.

Creating the pool

The best place to create the pool seems to be in the app_globals.py file, which is basically a container for objects which will be accessible "throughout the life of the application". Exactly what I want for a pool, in fact.

I just added at the end of the file my init code, which takes settings from the pylons configuration file :

"""Creating an instance of the Pycassa Pool"""
kwargs = {}

# Parsing servers
if 'cassandra.servers' in config['app_conf']:
    servers = config['app_conf']['cassandra.servers'].split(',')
    if len(servers):
        kwargs['server_list'] = servers

# Parsing timeout
if 'cassandra.timeout' in config['app_conf']:
    try:
        kwargs['timeout'] = float(config['app_conf']['cassandra.timeout'])
    except:
        pass

# Finally creating the pool
self.cass_pool = pycassa.QueuePool(keyspace='Keyspace1', **kwargs)

I could have done better, like moving that in a function, or supporting more parameters (pool size, ...). Which I'll do.

Getting a connection at each request

Well. There seems to be the simple way : in the file base.py, adding something like c.conn = g.cass_pool.get() before calling WSGIController, something like c.conn.return_to_pool() after. This is simple, and works. But this gets a connection from the pool even when it's not required by the controller. I have to dig a little deeper.

Creating a connection manager

I had the simple idea to create a class which would be instantiated at each request in the base.py file, and which would automatically grab a connection from the pool when requested (and release it after). This is a really simple class :

class LocalManager:
    '''Requests a connection from a Pycassa Pool when needed, and releases it at the end of the object's life'''

    def __init__(self, pool):
        '''Class constructor'''
        assert isinstance(pool, Pool)
        self._pool = pool
        self._conn = None

    def get(self):
        '''Grabs a connection from the pool if not already done, and returns it'''
        if self._conn is None:
            self._conn = self._pool.get()
        return self._conn

    def __getattr__(self, key):
        '''It's cooler to write "c.conn" than "c.get()" in the code, isn't it?'''
        if key == 'conn':
            return self.get()
        else:
            return self.__dict__[key]

    def __del__(self):
        '''Releases the connection, if needed'''
        if not self._conn is None:
            self._conn.return_to_pool()

Just added c.cass = CassandraLocalManager(g.cass_pool) before calling WSGIController in base.py, del(c.cass) after, and I'm all done.

And it works :

conn = c.cass.conn
cf = pycassa.ColumnFamily(conn, 'TestCF')
print cf.get('foo')

\o/

I don't know if this is the best way to do this. If not, please let me know =) Plus, I still did not understand the "synchronization" part in Pycassa source code. If it is needed in my case, and what should I do to avoid problems.

Thanks.

Pierre
A: 

Well. I worked a little more. In fact, using a connection manager was probably not a good idea as this should be the template context. Additionally, opening a connection for each thread is not really a big deal. Opening a connection per request would be.

I ended up with just pycassa.connect_thread_local() in app_globals, and there I go.

Pierre