views:

3039

answers:

3

I'm starting a new application and looking at using an ORM -- in particular, SQLAlchemy.

Say I've got a column 'foo' in my database and I want to increment it. In straight sqlite, this is easy:

db = sqlite3.connect('mydata.sqlitedb')
cur = db.cursor()
cur.execute('update table stuff set foo = foo + 1')

I figured out the SQLAlchemy SQL-builder equivalent:

engine = sqlalchemy.create_engine('sqlite:///mydata.sqlitedb')
md = sqlalchemy.MetaData(engine)
table = sqlalchemy.Table('stuff', md, autoload=True)
upd = table.update(values={table.c.foo:table.c.foo+1})
engine.execute(upd)

This is slightly slower, but there's not much in it.

Here's my best guess for a SQLAlchemy ORM approach:

# snip definition of Stuff class made using declarative_base
# snip creation of session object
for c in session.query(Stuff):
    c.foo = c.foo + 1
session.flush()
session.commit()

This does the right thing, but it takes just under fifty times as long as the other two approaches. I presume that's because it has to bring all the data into memory before it can work with it.

Is there any way to generate the efficient SQL using SQLAlchemy's ORM? Or using any other python ORM? Or should I just go back to writing the SQL by hand?

A: 

Withough testing, I'd try:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC, commit() works without flush()).

I've found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.

[Please note comment below - this did not speed things up at all].

Matthew Schinckel
Adding .all() and removing .flush() didn't change the time at all.
John Fouhy
A: 

If it is because of the overhead in terms of creating objects, then it probably can't be sped up at all with SA.

If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).

Matthew Schinckel
Nah, the table's all on its own. I've never used an ORM before -- is this just something they're bad at?
John Fouhy
There is an overhead due to creating Objects, but in my opinion it is worth the penalty - being able to persistently store objects in a database is awesome.
Matthew Schinckel
+12  A: 

SQLAlchemy's ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don't affect the objects that are in your session.

So if you say

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

it will do what it says, go fetch all the objects from the database, modify all the objects and then when it's time to flush the changes to the database, update the rows one by one.

Instead you should do this:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

This will execute as one query as you would expect, and because atleast the default session configuration expires all data in the session on commit you don't have any stale data issues.

In the almost-released 0.5 series you could also use this method for updating:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren't using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.

Ants Aasma