ansaurus

Question

SqlAlchemy optimizations for read-only object models.

Answer 1

A:

Try using a single query with JOINs instead of the python loops.

ebo 2010-02-23 23:06:51

Thanks, but isn't the point of the ORM being that those containers will be intelligently populated for me ? I'd hate to lose that benefit. I've also done some limited tests and it can actually be slower to run a big query and process the ResultProxy row by row, at which point I'm still paying for 'foo.property' access.

CarlS 2010-02-24 03:48:13

The ORM stuff is just a convenience to make it easier to work with rdbms in an object oriented way. It's not there to take the relational out of relational dbs.

ebo 2010-02-24 20:53:50

Answer 2

A:

You should be able to disable lazy loading on the relationships in question and sqlalchemy will fetch them all in a single query.

Travis Bradshaw 2010-02-24 01:47:33

It's not the speed of the query so much as the simple overhead of doing many thousand 'instrumented' accesses to object properties, ie 'foo.property'.

CarlS 2010-02-24 03:49:14

This usage pattern, when lazy loaded, will often generate a separate select statement for every iteration of every loop. (Usually visible if you turn on SQL output during test runs.) That's why my first response was this.

Travis Bradshaw 2010-02-24 06:19:42

ok, I'll double check this: last time I had debugging on I remember seeing a bunch of SQL upfront but none during the loops themselves. I should point out that I'm writing a monte-carlo simulator, so these loops are being run 100000's of times (I need to check that the SQL to fetch the containers is only being done once).

CarlS 2010-02-24 07:12:53

Ah, that's great then. SQLAlchemy must be fetching them all as a result of iterating over your `.collection` attributes. Generally, my "step one" for all troubleshooting that starts in the form of "doing something with SQLAlchemy that is slow" is to turn on SQL output to make sure that it's doing what I think it's doing. If it is, then I move on. If it isn't, then it's time to tweak the algorithm or the mappers.

Travis Bradshaw 2010-02-24 07:30:34

Also, while I absolutely love SQLAlchemy, if performance is the most important contraint for your software (above, say, speed of development and ease of maintenance), then perhaps using an abstraction library like SQLAlchemy isn't the right tool for the job.

Travis Bradshaw 2010-02-24 07:31:38

Well not so fast :) As i said last time I checked this wasn't a problem, but your comment will make me double check. I believe the SQLAlchemy ORM has given me a lot of leverage getting a complex object graph up and running, my only complaint was the instrumentation. I use the logging regularly, usually before the profiling step. I'm hoping by hacking the mapper to a non instrumented plain python object as described above, that I can wring some speed out. Failing that then it's probably off to a C extension, using sqlite api directly into a bunch of c structs.

CarlS 2010-02-24 07:53:07

You can query single columns and sqlalchemy will provide you with a list of tuples instead of objects with instrumented properties.

ebo 2010-02-24 20:52:47

Answer 3

A:

If you reference a single attribute of a single instance lots of times, a simple trick is to store it in a local variable.

If you want a way to create cheap pure python clones, share the dict object with the original object:

class CheapClone(object):
    def __init__(self, original):
        self.__dict__ = original.__dict__

Creating a copy like this costs about half of the instrumented attribute access and attribute lookups are as fast as normal.

There might also be a way to have the mapper create instances of an uninstrumented class instead of the instrumented one. If I have some time, I might take a look how deeply ingrained is the assumption that populated instances are of the same type as the instrumented class.

Found a quick and dirty way that seems to at least somewhat work on 0.5.8 and 0.6. Didn't test it with inheritance or other features that might interact badly. Also, this touches some non-public API's, so beware of breakage when changing versions.

from sqlalchemy.orm.attributes import ClassManager, instrumentation_registry

class ReadonlyClassManager(ClassManager):
    """Enables configuring a mapper to return instances of uninstrumented 
    classes instead. To use add a readonly_type attribute referencing the
    desired class to use instead of the instrumented one."""
    def __init__(self, class_):
        ClassManager.__init__(self, class_)
        self.readonly_version = getattr(class_, 'readonly_type', None)
        if self.readonly_version:
            # default instantiation logic doesn't know to install finders
            # for our alternate class
            instrumentation_registry._dict_finders[self.readonly_version] = self.dict_getter()
            instrumentation_registry._state_finders[self.readonly_version] = self.state_getter()

    def new_instance(self, state=None):
        if self.readonly_version:
            instance = self.readonly_version.__new__(self.readonly_version)
            self.setup_instance(instance, state)
            return instance
        return ClassManager.new_instance(self, state)

Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadonlyClassManager

Usage example:

class ReadonlyFoo(object):
    pass

class Foo(Base, ReadonlyFoo):
    __tablename__ = 'foo'
    id = Column(Integer, primary_key=True)
    name = Column(String(32))

    readonly_type = ReadonlyFoo

assert type(session.query(Foo).first()) is ReadonlyFoo

Ants Aasma 2010-02-24 02:27:53

Unfortunately the usage pattern is many calculations across many small objects, so the local caching is not so helpful. The cloning idea really sounds like the way to go, thanks for the quick tip. Your final comment is exactly what I'd like: ask the mapper to create an 'uninstrumented' class, because I know it's read-only.

CarlS 2010-02-24 03:42:26

Thanks a bunch! I can't wait to try this out.

CarlS 2010-02-24 05:49:12

I've done some initial work on the mapper hack suggested and the time differences are encouraging. For a simple loop: for i in xrange(500000): foo = readonlyobj.attr_barwith normal instrumentation: 2.663 secswith readonly mapper hack: 0.078 secsThat's a very significant result imo, so thanks again. I'm still trying to really understand how it works and it's proving a great way to learn sqlalchemy in a bit more depth.

CarlS 2010-03-02 04:18:12

ansaurus

tags:

views:

answers:

SqlAlchemy optimizations for read-only object models.

related questions