views:

34

answers:

2

Is there a way in sqlalchemy to turn off declarative's polymorphic join loading, in a single query? Most of the time it's nice, but I have:

class A(Base) : 
   discriminator = Column('type', mysql.INTEGER(1), index=True, nullable=False)
   __mapper_args__ = { 'polymorphic_on' : discriminator }
   id = Column(Integer, primary_key=True)
   p = Column(Integer)

class B(A) : 
   __mapper_args__ = { 'polymorphic_identity' : 0 }
   id = Column(Integer, primary_key=True)
   x = Column(Integer)

class C(A) : 
   __mapper_args__ = { 'polymorphic_identity' : 1 }
   id = Column(Integer, primary_key=True)
   y = Column(String)

I want to make a query such that I get all A.ids for which B.x > 10, if that A is actually a B, or where C.y == 'blah', if that A is actually a C, all ordered by p.

To do it iteratively, I'm starting just with the first part - "get all A.id for which B.x > 10 if that A is actually a B". So I thought I would start with an outer join:

session.query(A.id).outerjoin((B, B.id == A.id)).filter(B.x > 10)

... except there seems to be no way to avoid having that outerjoin((B, B.id == A.id)) clause generate a full join of everything in A to everything in B within a subselect. If B doesn't inherit from A, then that doesn't happen, so I'm thinking it's the polymorphic declarative code generation that does that. Is there a way to turn that off? Or to force the outerjoin to do what I want?

What I want is something like this:

select a.id from A a left outer join B b on b.id == a.id where b.x > 10

but instead I get something like:

select a.id from A a left outer join (select B.id, B.x, A.id from B inner join A on B.id == A.id)

... as an aside, if it's not possible, then is the latter less efficient than the former? Will the sql engine actually perform that inner join, or will it elide it?

+1  A: 

You should use with_polymorphic() instead of outerjoin(), which seems to return the expected results:

session.query(A).with_polymorphic(B).filter(B.x > 10).all()
# BEGIN
# SELECT "A".type AS "A_type", "A".id AS "A_id", "A".p AS "A_p", "B".id AS "B_id", "B".x AS "B_x" 
# FROM "A" LEFT OUTER JOIN "B" ON "A".id = "B".id 
# WHERE "B".x > ?
# (10,)
# Col ('A_type', 'A_id', 'A_p', 'B_id', 'B_x')

Compared to:

session.query(A.id).outerjoin((B, B.id == A.id)).filter(B.x > 10)
# BEGIN
# SELECT "A".id AS "A_id" 
# FROM "A" LEFT OUTER JOIN (SELECT "A".type AS "A_type", "A".id AS "A_id", "A".p AS "A_p", "B".id AS "B_id", "B".x AS "B_x" 
# FROM "A" JOIN "B" ON "A".id = "B".id) AS anon_1 ON anon_1."A_id" = "A".id 
# WHERE anon_1."B_x" > ?
# (10,)
# Col ('A_id',)

The code I used to test this, in case anybody wants to test this neat bit of SQLAlchemy:

#!/usr/bin/env python
import logging
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()

class A(Base) :
   __mapper_args__ = { 'polymorphic_on' : discriminator }
   __tablename__ = 'A'

   id = Column(Integer, primary_key=True)
   discriminator = Column('type', Integer, index=True, nullable=False)
   p = Column(Integer)

class B(A) :
   __mapper_args__ = { 'polymorphic_identity' : 0 }
   __tablename__ = 'B'

   id = Column(Integer, ForeignKey('A.id'), primary_key=True)
   x = Column(Integer)

class C(A) :
   __mapper_args__ = { 'polymorphic_identity' : 1 }
   __tablename__ = 'C'

   id = Column(Integer, ForeignKey('A.id'), primary_key=True)
   y = Column(String)

meta = Base.metadata
meta.bind = create_engine('sqlite://')
meta.create_all()

Session = sessionmaker()
Session.configure(bind=meta.bind)
session = Session()

log = logging.getLogger('sqlalchemy')
log.addHandler(logging.StreamHandler())
log.setLevel(logging.DEBUG)

session.query(A.id).outerjoin((B, B.id == A.id)).filter(B.x > 10).all()
session.query(A).with_polymorphic(B).filter(B.x > 10).all()

I ran this on Python 2.7 with SQLAlchemy 0.6.4.

Hao Lian
Is there a way to only return A.id, not everything in A and B? When I do session.query(A.id).with_polymorphic(B)... I get "No primary mapper set up for this Query" exception...
Colin
I couldn't figure out how to call with_polymorphic() without a primary query mapper after playing around with it and reading the docs/code, which is all sorts of twisty. I think this might be a question better answered by the sqlalchemy mailing list.
Hao Lian
+1  A: 

You could try building the queries for each subclass individually, then unioning them together. When querying B.id, SQLAlchemy implicitly joins the superclass and returns A.id instead, so taking the union of selects for B.id and C.id only returns a single column.

>>> b_query = session.query(B.id).filter(B.x > 10)
>>> c_query = session.query(C.id).filter(C.y == 'foo')
>>> print b_query.union(c_query)
SELECT anon_1."A_id" AS "anon_1_A_id" 
FROM (SELECT "A".id AS "A_id" 
FROM "A" JOIN "B" ON "A".id = "B".id 
WHERE "B".x > ? UNION SELECT "A".id AS "A_id" 
FROM "A" JOIN "C" ON "A".id = "C".id 
WHERE "C".y = ?) AS anon_1

You still get a subselect, but only a single "layer" of joins - the outer select is just renaming the column.

dhaffey
Great idea! A quick tangent you might know the answer to: if I place an order_by(A.p) and a limit(SOMENUM), it still works, and generates what I expect. But, will that be super inefficient, if SOMENUM << count(B)+count(C)? If I order_by and limit selection from a single table, I'd expect the engine NOT to extract and sort everything (right?), but if it's a union that's being ordered and limited? EXPLAINing the generated query does seem to result in it scanning every row that matches the filters... any way around that? Maybe this should be a separate question...
Colin
You could try repeating the `limit` and `order_by` clauses for the individual subclass queries. That shouldn't change the result, since for a limit of *N* you'll use at most the top *N* rows from each subclass table.
dhaffey