ansaurus

Question

In my MapperExtension.create_instance, how can I extract individual row data by column name?

Answer 1

+1 A:

The row accepts Column objects as indexes:

row[MyClass.some_element.__clause_element__()]

but that will only get you as far as the classes and aliased() constructs you have access to on the outside. Its very likely that would be all you'd need for that part of the issue (even though ultimately the idea won't work, read on).

If your statement has had subqueries wrapped around it, from using things like from_self() or join() to a polymorphic target, the create_instance() method doesn't give you access to the translation functions you'd need to accomplish that.

If you're trying to get at rows that are linked to an eagerload(), that's totally not something you should be doing. eagerload() is about optimizing the load of collections. If you want your query to join between two tables and you're looking to filter on the joined table, use join().

But above all, create_instance() is from version 0.1 of SQLAlchemy and I doubt anyone uses it for anything, and it has no capability to say, "skip this row". It has to return something or the mapper will create the instance on its own. So no matter how well you can interpret the row, there's no hook for what you want to do here.

If i really wanted to do such a thing, it would likely be easier to monkeypatch the "fetchall()" method of the returned ResultProxy to filter rows, and send it to Query.instances(). Any result can be sent to this method. Although, if the Query has done translations and such on the mapped selectables, it would need the original QueryContext as well to know how to translate. But this is nothing I'd be bothering with either.

Overall, if speed is so critical of an issue throughout all of this that creating the object is that big of a difference, I'd make it so that I don't need the mapped objects at all for the whole operation, or I'd use caching, or generate the objects I need manually from a result set. I also would make sure that I have access to all the targeted columns in the selectable I'm using so I can re-fetch from result rows, which means I either don't use automatic-subquery/alias generation functions in the ORM, or I use the expression language directly (if you're really hungry for speed and are in the mood to write large tracts of optimizing code, you should probably just be using the expression language).

So the real questions you have to ask here are:

Have you verified that the real difference in speed is creating the object from the row. I.e. not fetching the row, or fetching its columns, etc.
Does the row just have some expensive columns that you don't need? Have you looked into deferred() ?
What are these business rules and why cant they be done in SQL, as stored procedures, etc.
How many thousands of rows are you really skipping here, that its so "slow" to not "skip" them
Have you investigated techniques for having the objects already present, like in-memory caches, preloads, etc. For many scenarios, this fits the bill.
None of this works, and you really want to hack up some home-rolled optimization code. So why not use the SQL expression language directly? If ultimately you're just dealing with a view layer, result rows are quite friendly (they allow "attribute" style access and such), or build some quick "generate an object" routine from it. The ORM presents a very specific use case of the SQL expression language, and if you really need something much more lightweight than it, you're better off skipping it.

zzzeek 2010-09-04 03:11:40

ansaurus

tags:

views:

answers:

In my MapperExtension.create_instance, how can I extract individual row data by column name?

related questions