views:

349

answers:

2

I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.

Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.

We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.

Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.

My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.

We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.

Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:

# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):

    # note: web session contains {'username': XXX, 'password': YYY}
    url1 = 'postgres://%(username)s:%(password)s@server1/finance' % session
    url2 = 'postgres://%(username)s:%(password)s@server2/staff' % session

    finance = create_engine(url1)
    staff = create_engine(url2)
    db_configure(staff, finance)  # see below
    ... etc

# in another file

Session = scoped_session(sessionmaker())

def db_configure(staff, finance):
    s = Session()

    from db.finance import Employee, Customer, Invoice
    for c in [
        Employee,
        Customer,
        Invoice,
        ]:
        s.bind_mapper(c, finance)

    from db.staff import Project, Hour
    for c in [
        Project,
        Hour,
        ]:
        s.bind_mapper(c, staff)

    s.close()  # prevents leaking connections between sessions?

So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.

But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.

Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.

A: 

I would look at the connection pooling and see if you can't find a way to have one pool per user. You can dispose() the pool when the user's session has expired

gnibbler
+2  A: 

Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:

binds = {}

finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))

staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))

session = sessionmaker(binds=binds)()
Denis Otkidach
@Denis, we do have separate metadatas, one for each of the two databases. Given that, are both binds.update() calls required for each database, as in your example, or could we get by with only the ones that use xxx_metadata.sorted_tables? I guess I'm hoping to find something that ties the bind to the metadata, but on a per-request basis... I should have mentioned that.
Peter Hansen
Metadata is global. A global binding is always not so good, but it doesn't lead to problems while engine is constant. Using variable global engine will require ugly hacks similar to `threading.local()` which are very bad and error prone. Using session that survive request can potentially leak some objects from one user to other. While you can write some code to protect it, it's better to go way that doesn't have such hole by design.
Denis Otkidach
I don't mean setting metadata.bind, of course, because as you say that's global. I mean using the fact that the metadata object already knows about all the associated tables. Why do we have to list [Employee, Customer, Invoice] etc explicitly, instead of just saying (pseudo-code) "sessionmaker(binds={finance_metadata: finance_engine, staff_metadata: staff_ending})". I would have thought there would be some support for using that level of indirection, rather than having to specify each table individually.
Peter Hansen
Okay, this works for me, and in fact without the second in each pair of binds.update() calls... I don't know what we'll lose by not having those, but I'm sure a traceback will tell us if it's really needed. The most helpful parts for me were the "binds" argument (I saw only "bind"), and the idea of putting this in the call to sessionmaker() and getting a new session class each time. It's still just as "heavy-weight" behind the scenes, but after examining SA's session.py it's clear that it supports only very low (one bind) or very high granularity. Thanks Denis!
Peter Hansen
You are right, using metadata directly in `binds` parameter would be a nice option, but this doesn't work out of the box: you have to rewrite `Session.get_bind()` method to support this. I think it worth filling feature request. Passing model list in addition to tables was required when using joint table inheritance in SQLAlchemy 0.5.4 due to a bug in `get_bind()`, but it might be already fixed.
Denis Otkidach