views:

1032

answers:

3

Using SQLAlchemy, I have a one to many relation with two tables - users and scores. I am trying to query the top 10 users sorted by their aggregate score over the past X amount of days.

users:  
  id  
  user_name  
  score  

scores:  
  user   
  score_amount  
  created  

My current query is:

 top_users = DBSession.query(User).options(eagerload('scores')).filter_by(User.scores.created > somedate).order_by(func.sum(User.scores).desc()).all()  

I know this is clearly not correct, it's just my best guess. However, after looking at the documentation and googling I cannot find an answer.

EDIT: Perhaps it would help if I sketched what the MySQL query would look like:

SELECT user.*, SUM(scores.amount) as score_increase FROM user LEFT JOIN scores ON scores.user_id = user.user_id WITH scores.created_at > someday ORDER BY score_increase DESC
+1  A: 

You will need to use a subquery in order to compute the aggregate score for each user. Subqueries are described here: http://www.sqlalchemy.org/docs/05/ormtutorial.html?highlight=subquery#using-subqueries

Antoine P.
A: 

I am assuming the column (not the relation) you're using for the join is called Score.user_id, so change it if this is not the case.

You will need to do something like this:

DBSession.query(Score.user_id, func.sum(Score.score_amount).label('total_score')).group_by(Score.user_id).filter(Score.created > somedate).order_by('total_score DESC')[:10]

However this will result in tuples of (user_id, total_score). I'm not sure if the computed score is actually important to you, but if it is, you will probably want to do something like this:

users_scores = []
q = DBSession.query(Score.user_id, func.sum(Score.score_amount).label('total_score')).group_by(Score.user_id).filter(Score.created > somedate).order_by('total_score DESC')[:10]
for user_id, total_score in q:
    user = DBSession.query(User)
    users_scores.append((user, total_score))

This will result in 11 queries being executed, however. It is possible to do it all in a single query, but due to various limitations in SQLAlchemy, it will likely create a very ugly multi-join query or subquery (dependent on engine) and it won't be very performant.

If you plan on doing something like this often and you have a large amount of scores, consider denormalizing the current score onto the user table. It's more work to upkeep, but will result in a single non-join query like:

DBSession.query(User).order_by(User.computed_score.desc())

Hope that helps.

Crast
yikes. no such limitations I'm aware of.
zzzeek
+3  A: 

The single-joined-row way, with a group_by added in for all user columns although MySQL will let you group on just the "id" column if you choose:

    sess.query(User, func.sum(Score.amount).label('score_increase')).\
               join(User.scores).\
               filter(Score.created_at > someday).\
               group_by(User).\
               order_by("score increase desc")

Or if you just want the users in the result:

sess.query(User).\
           join(User.scores).\
           filter(Score.created_at > someday).\
           group_by(User).\
           order_by(func.sum(Score.amount))

The above two have an inefficiency in that you're grouping on all columns of "user" (or you're using MySQL's "group on only a few columns" thing, which is MySQL only). To minimize that, the subquery approach:

subq = sess.query(Score.user_id, func.sum(Score.amount).label('score_increase')).\
                  filter(Score.created_at > someday).\
                  group_by(Score.user_id).subquery()
sess.query(User).join((subq, subq.c.user_id==User.user_id)).order_by(subq.c.score_increase)

An example of the identical scenario is in the ORM tutorial at: http://www.sqlalchemy.org/docs/05/ormtutorial.html#using-subqueries.

zzzeek
Hi thanks for the response. This works well and the documentation helps a lot. How would I gain access to the score_increase for a user? Take for instance the query is assigned to the variable top_users and I loop through each user. user.score_increase does not work, nor does user.UserScore.score_increase.
Marc
using the third query, if you iterate through sess.query(User, subq.c.score_increase), you will get tuples of (User, score_increase)
zzzeek
hmm I must be missing something here. In case it matters, I'm using Turbogears 2 and I am assigning the result of the third query to a variable top_users which is available in my templates. I then loop through -- for user in top_users: print user.user_name+' '+user.score_increase -- basically I want to show the amount the users score has increase the past x number of days. I don't understand how to access the joined data within the top_users tuple.
Marc
for user, score_increase in top_users:
zzzeek
Hi I tried that and get this error: "TypeError: 'User' object is not iterable". Perhaps this is a different question entirely?
Marc
just for clarity: "for user, score_increase in sess.query(User, subq.c.score_increase)....", i.e. you are asking for an iteration of tuples, and are iterating it. There are examples of this in the ORM tutorial.
zzzeek
Does that perform a query for every iteration? The problem is that I am iterating over an assigned variable (in this case I assigned the result of the query above to the variable top_users) in a mako template, I cannot access the session that I used to perform the query in the controller method.
Marc
you can pre iterate it by running it through list(). though i dunno what TG does but the session should be accessible within the template, it would be extremely painful to always make everything load completely before reaching the template (i.e. lazyloads dont work...ouch)
zzzeek