views:

86

answers:

2

I had for the last month a bit of a problem with a quite basic datastore query. It involves 2 db.Models with one referring to the other with a db.ReferenceProperty.

The problem is that according to the admin logs the request takes about 2-4 seconds to complete. I strip it down to a bare form and a list to display the results. The put works fine, but the get accumulates (in my opinion) way to much cpu time.

#The get look like this: 
outputData['items'] = {} 
labelsData = Label.all() 
for label in labelsData: 
        labelItem = label.item.name 
        if labelItem not in outputData['items']: 
                outputData['items'][labelItem] = { 'item' : labelItem, 'labels' : [] } 
        outputData['items'][labelItem]['labels'].append(label.text) 
path = os.path.join(os.path.dirname(__file__), 'index.html') 
self.response.out.write(template.render(path, outputData)) 
#And the models: 
class Item(db.Model): 
        name = db.StringProperty() 
class Label(db.Model): 
        text = db.StringProperty() 
        lang = db.StringProperty() 
        item = db.ReferenceProperty(Item) 

I've tried to make it a number of different way ie. instead of ReferenceProperty storing all Label keys in the Item Model as a db.ListProperty.

My test data is just 10 rows in Item and 40 in Label.

So my questions: Is it a fools errand to try to optimize this since the high cpu usage is due to the problems with the datastore or have I just screwed up somewhere in the code? ..fredrik

EDIT:

I got a great response from djidjadji at the google appengine mailing list.

The new code looks like this:

outputData['items'] = {}
labelsData = Label.all().fetch(1000)
labelItems = db.get([Label.item.get_value_for_datastore(label) for label in labelsData ])
for label,labelItem in zip(labelsData, labelItems):
   name = labelItem.name
   try:
       outputData['items'][name]['labels'].append(label.text)
   except KeyError:
       outputData['items'][name] = { 'item' : name, 'labels' : [label.text] }
+2  A: 

There's certainly things you can do to optimize your code. For example, you're iterating over a query, which is less efficient than fetching the query and iterating over the results.

I'd recommend using Appstats to profile your app, and check out the Patterns of Doom series of posts.

Nick Johnson
I've used Appstats. And it's the ReferenceProperty that makes and query for each loop that is the problem. But I got some great help at the googleappengine mailing list. I didn't know how to make the same result without referring to the ReferenceProperty. The solution was in using get_value_for_datastore before the for loop.
fredrik
You've still got the iterating-over-a-query issue. Calling .fetch() instead of iterating over the query will be much more efficient. Also, see this blog post for a recipe to resolve reference properties: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine
Nick Johnson
A: 

Don't just try things. That's guessing. You'll only be right some of the time. Don't ask other people to guess either, for the same reason.

Be right every time.

Just pause the code several times and look at the call stack. That will tell you exactly what's going on.

Mike Dunlavey
Unfortunately one ends up guessing when one's knowledge doesn't cover the issue at hand. It becomes even harder when one doesn't know more specific information about it then "It runs to many queries". The solution I got for google appengine mailing list was something I didn't know was possible to do. I've read about it in the documentation but didn't understand what it did. So if you say one shouldn't guess, how would you go about it?
fredrik
@fredrik: Here's what I suggest. Run your program under the pdb (http://docs.python.org/library/pdb.html). Press Ctrl-Break or Ctrl-C while it is running. Type "w(here)" to see the call stack. Understand what it is doing and why. Repeat several times. If X% time is being spent in your code, X% (+/-) stacks will show it. If in datastore, same thing. The measurement will be approximate, but the activity will be dead certain and detailed. That's what I mean by not guessing.
Mike Dunlavey