views:

359

answers:

3

I have a complex network of objects being spawned from a sqlite database using sqlalchemy ORM mappings. I have quite a few deeply nested:

for parent in owner.collection: 
    for child in parent.collection: 
        for foo in child.collection: 
            do lots of calcs with foo.property 

My profiling is showing me that the sqlalchemy instrumentation is taking a lot of time in this use case.

The thing is: I don't ever change the object model (mapped properties) at runtime, so once they are loaded I don't NEED the instrumentation, or indeed any sqlalchemy overhead at all. After much research, I'm thinking I might have to clone a 'pure python' set of objects from my already loaded 'instrumented objects', but that would be a pain.

Performance is really crucial here (it's a simulator), so maybe writing those layers as C extensions using sqlite api directly would be best. Any thoughts?

A: 

Try using a single query with JOINs instead of the python loops.

ebo
Thanks, but isn't the point of the ORM being that those containers will be intelligently populated for me ? I'd hate to lose that benefit. I've also done some limited tests and it can actually be slower to run a big query and process the ResultProxy row by row, at which point I'm still paying for 'foo.property' access.
CarlS
The ORM stuff is just a convenience to make it easier to work with rdbms in an object oriented way. It's not there to take the relational out of relational dbs.
ebo
A: 

You should be able to disable lazy loading on the relationships in question and sqlalchemy will fetch them all in a single query.

Travis Bradshaw
It's not the speed of the query so much as the simple overhead of doing many thousand 'instrumented' accesses to object properties, ie 'foo.property'.
CarlS
This usage pattern, when lazy loaded, will often generate a separate select statement for every iteration of every loop. (Usually visible if you turn on SQL output during test runs.) That's why my first response was this.
Travis Bradshaw
ok, I'll double check this: last time I had debugging on I remember seeing a bunch of SQL upfront but none during the loops themselves. I should point out that I'm writing a monte-carlo simulator, so these loops are being run 100000's of times (I need to check that the SQL to fetch the containers is only being done once).
CarlS
Ah, that's great then. SQLAlchemy must be fetching them all as a result of iterating over your `.collection` attributes. Generally, my "step one" for all troubleshooting that starts in the form of "doing something with SQLAlchemy that is slow" is to turn on SQL output to make sure that it's doing what I think it's doing. If it is, then I move on. If it isn't, then it's time to tweak the algorithm or the mappers.
Travis Bradshaw
Also, while I absolutely love SQLAlchemy, if performance is the most important contraint for your software (above, say, speed of development and ease of maintenance), then perhaps using an abstraction library like SQLAlchemy isn't the right tool for the job.
Travis Bradshaw
Well not so fast :) As i said last time I checked this wasn't a problem, but your comment will make me double check. I believe the SQLAlchemy ORM has given me a lot of leverage getting a complex object graph up and running, my only complaint was the instrumentation. I use the logging regularly, usually before the profiling step. I'm hoping by hacking the mapper to a non instrumented plain python object as described above, that I can wring some speed out. Failing that then it's probably off to a C extension, using sqlite api directly into a bunch of c structs.
CarlS
You can query single columns and sqlalchemy will provide you with a list of tuples instead of objects with instrumented properties.
ebo
A: 

If you reference a single attribute of a single instance lots of times, a simple trick is to store it in a local variable.

If you want a way to create cheap pure python clones, share the dict object with the original object:

class CheapClone(object):
    def __init__(self, original):
        self.__dict__ = original.__dict__

Creating a copy like this costs about half of the instrumented attribute access and attribute lookups are as fast as normal.

There might also be a way to have the mapper create instances of an uninstrumented class instead of the instrumented one. If I have some time, I might take a look how deeply ingrained is the assumption that populated instances are of the same type as the instrumented class.


Found a quick and dirty way that seems to at least somewhat work on 0.5.8 and 0.6. Didn't test it with inheritance or other features that might interact badly. Also, this touches some non-public API's, so beware of breakage when changing versions.

from sqlalchemy.orm.attributes import ClassManager, instrumentation_registry

class ReadonlyClassManager(ClassManager):
    """Enables configuring a mapper to return instances of uninstrumented 
    classes instead. To use add a readonly_type attribute referencing the
    desired class to use instead of the instrumented one."""
    def __init__(self, class_):
        ClassManager.__init__(self, class_)
        self.readonly_version = getattr(class_, 'readonly_type', None)
        if self.readonly_version:
            # default instantiation logic doesn't know to install finders
            # for our alternate class
            instrumentation_registry._dict_finders[self.readonly_version] = self.dict_getter()
            instrumentation_registry._state_finders[self.readonly_version] = self.state_getter()

    def new_instance(self, state=None):
        if self.readonly_version:
            instance = self.readonly_version.__new__(self.readonly_version)
            self.setup_instance(instance, state)
            return instance
        return ClassManager.new_instance(self, state)

Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadonlyClassManager

Usage example:

class ReadonlyFoo(object):
    pass

class Foo(Base, ReadonlyFoo):
    __tablename__ = 'foo'
    id = Column(Integer, primary_key=True)
    name = Column(String(32))

    readonly_type = ReadonlyFoo

assert type(session.query(Foo).first()) is ReadonlyFoo
Ants Aasma
Unfortunately the usage pattern is many calculations across many small objects, so the local caching is not so helpful. The cloning idea really sounds like the way to go, thanks for the quick tip. Your final comment is exactly what I'd like: ask the mapper to create an 'uninstrumented' class, because I know it's read-only.
CarlS
Thanks a bunch! I can't wait to try this out.
CarlS
I've done some initial work on the mapper hack suggested and the time differences are encouraging. For a simple loop: for i in xrange(500000): foo = readonlyobj.attr_barwith normal instrumentation: 2.663 secswith readonly mapper hack: 0.078 secsThat's a very significant result imo, so thanks again. I'm still trying to really understand how it works and it's proving a great way to learn sqlalchemy in a bit more depth.
CarlS