ansaurus

Question

Efficiently updating database using SQLAlchemy ORM

Answer 1

A:

Withough testing, I'd try:

for c in session.query(Stuff).all():
     c.foo = c.foo+1
session.commit()

(IIRC, commit() works without flush()).

I've found that at times doing a large query and then iterating in python can be up to 2 orders of magnitude faster than lots of queries. I assume that iterating over the query object is less efficient than iterating over a list generated by the all() method of the query object.

[Please note comment below - this did not speed things up at all].

Matthew Schinckel 2008-11-07 00:35:19

Adding .all() and removing .flush() didn't change the time at all.

John Fouhy 2008-11-07 01:03:33

Answer 2

A:

If it is because of the overhead in terms of creating objects, then it probably can't be sped up at all with SA.

If it is because it is loading up related objects, then you might be able to do something with lazy loading. Are there lots of objects being created due to references? (IE, getting a Company object also gets all of the related People objects).

Matthew Schinckel 2008-11-07 01:08:04

Nah, the table's all on its own. I've never used an ORM before -- is this just something they're bad at?

John Fouhy 2008-11-07 01:17:54

There is an overhead due to creating Objects, but in my opinion it is worth the penalty - being able to persistently store objects in a database is awesome.

Matthew Schinckel 2008-11-18 11:42:53

Answer 3

+12 A:

SQLAlchemy's ORM is meant to be used together with the SQL layer, not hide it. But you do have to keep one or two things in mind when using the ORM and plain SQL in the same transaction. Basically, from one side, ORM data modifications will only hit the database when you flush the changes from your session. From the other side, SQL data manipulation statements don't affect the objects that are in your session.

So if you say

for c in session.query(Stuff).all():
    c.foo = c.foo+1
session.commit()

it will do what it says, go fetch all the objects from the database, modify all the objects and then when it's time to flush the changes to the database, update the rows one by one.

Instead you should do this:

session.execute(update(stuff_table, values={stuff_table.c.foo: stuff_table.c.foo + 1}))
session.commit()

This will execute as one query as you would expect, and because atleast the default session configuration expires all data in the session on commit you don't have any stale data issues.

In the almost-released 0.5 series you could also use this method for updating:

session.query(Stuff).update({Stuff.foo: Stuff.foo + 1})
session.commit()

That will basically run the same SQL statement as the previous snippet, but also select the changed rows and expire any stale data in the session. If you know you aren't using any session data after the update you could also add synchronize_session=False to the update statement and get rid of that select.

Ants Aasma 2008-11-10 17:40:42

ansaurus

tags:

views:

answers:

Efficiently updating database using SQLAlchemy ORM

related questions