views:

99

answers:

2

Yet another newbie question..

Let's say I have an user table in declarative mode:

class User(Base):
    __tablename__ = 'user'
    id = Column(u'id', Integer(), primary_key=True)
    name = Column(u'name', String(50))

When I have a list of users identifiers, I fetch them from db with:

user_ids = [1, 2, 3, 4, 5]
users = Session.query(User).filter(User.id.in_(user_ids)).all()

I dislike using in_ because I think I learned it has bad performance on indexed fields (is that true/false?).

Anyway, is there a better way doing that query?

Thanks!

EDIT: I am using MySQL

How would I write this query using OR with sqlalchemy?

Re-EDIT: got it:

from sqlalchemy.sql.expression import or_
user_ids = [1, 2, 3, 4, 5]
clauses = or_( *[User.user_id==x for x in users] )
users = Session.query(User).filter(clauses).all()
+3  A: 

A lot of performance woes are database engine dependent. The rest of this post will deal with MySQL.

An IN() clause can have bad performance on an indexed field, but it will not in the example that you gave. Up to a certain number of user_ids, your query will be the fastest possible. However, at some point, it becomes faster to put the user_ids into a temporary table and join against it. You can see more details about the performance of IN() vs a temporary table in MySQL here.

If that list of user_ids is based on some attribute of the users (such as being admins or being invalid) then you could add a field to your User table and avoid this problem altogether.

davidfischer
+2  A: 

An alternate to using an "in" clause is to "or" the ids, i.e., where "id = 1 or id = 2 or id = 3". If there is just a few you might gain some speed going that way.

From the docs: http://www.sqlalchemy.org/docs/ormtutorial.html#common-filter-operators

from sqlalchemy import or_
filter(or_(User.name == 'ed', User.name == 'wendy'))

You don't say what DBM you are using but your administrator might be your best asset here. The best way to know what construct to use is to analyze the query and try a couple different ones so you know how well your particular database engine handles the various queries. No matter what database you are using, if it supports "or" or "in", you are probably going to gain a lot of speed using those instead of looping over all the ids you need and making individual queries.

It might turn out that worrying about whether to use "in" or another construct is not going to make a big difference to the overall speed of the app in comparison to changing other parts of the code. Database engines are getting pretty crafty about optimizing simple queries so you'll get good performance as long as your query is sensible. One of the things we have to learn about programming is to get the code working well first, then test and optimize if there are problems. Often we assume we know where the bottleneck is but a profiling tool will show we were wrong. Using a profiler and benchmark tools can help narrow down the problem areas and show what are the best ways to speed up whatever needs tweaking.

Greg