views:

234

answers:

2

so i have a User class

class User(db.Model):
  points = db.IntegerProperty()

so I created 1000 dummy entities on development server with points ranging from 1 to 1000

query = db.GqlQuery("SELECT * FROM User WHERE points >= 300"
                     "AND points <= 700"
                     "LIMIT 20"
                     "ORDER BY points desc")

I only want 20 results per query ( enough to fill a page). I don't need any pagination of the results.

Everything looks ok, it worked on developement server.

Question:
1. Will it work on a production server with 100,000 - 500,000 user entities? Will i experience great lag? I hope not, cos I heard that App Engine indexes the points column automatically
2. Any other optimization techniques that you can recommend?

+1  A: 

I think that it is difficult to say what kind of performance issues that you will have with such a large number of entities. This one particular query will probably be fine, but you should be aware that no datastore query can ever return more than 1000 entities, so if you need to operate on numbers larger than 1000, you will need to do it in batches, and you may want to partition them into separate entity groups.

As far as optimization goes, you may want to consider caching the results of this query and only running it when you know the information has changed or at specific intervals. If the query is for some purpose where exactly correct results are not totally critical -- say, displaying a leader board or a high score list -- you might be choose to update and cache the result once every hour or something like that.

The only other optimization that I can think of is that you can save the cycles associated with parsing that GQL statement by doing it once and saving the resulting object, either in memchache or a global variable.

Adam Crossland
There's no point in caching the GQL object - parsing is cheaper than pickling/unpickling! But using the Query interface is (slightly) faster than GQL.
Nick Johnson
I didn't mean to say that the GQL object needed to be pickled. Rather, by holding an instance of it, you could avoid having to reparse it on every request, but if Nick Johnson says you don't need to bother, then certainly don't.
Adam Crossland
+1  A: 

Your code seems fine to get the top users, but more complex queries, like finding out what's the rank of any specific user will be hard. If you need this kind of functionality too, have a look at google-app-engine-ranklist.

Ranklist is a python library for Google App Engine that implements a data structure for storing integer scores and quickly retrieving their relative ranks.

jbochi