views:

1280

answers:

4

Google app engine tells me to optimize this code. Anybody any ideas what I could do?

def index(request):
    user = users.get_current_user()
    return base.views.render('XXX.html', 
                 dict(profiles=Profile.gql("").fetch(limit=100), user=user))

And later in the template I do:

{% for profile in profiles %}
  <a href="/profile/{{profile.user.email}}/"><img src="{{profile.gravatarUrl}}"></a>
  <a href="/profile/{{profile.user.email}}/">{{ profile.user.nickname }}</a>
  <br/>{{ profile.shortDisplay }}

Where the methods used are:

def shortDisplay(self):
    return "%s/day; %s/week; %s days" % (self.maxPerDay, self.maxPerWeek, self.days)

def gravatarUrl(self):
    email = self.user.email().lower()
    default = "..."
    gravatar_url = "http://www.gravatar.com/avatar.php?"
    gravatar_url += urllib.urlencode({'gravatar_id':hashlib.md5(email).hexdigest(), 
        'default':default, 'size':"64"})
    return gravatar_url
+3  A: 

I would guess that performing an md5 hash on every item every time is pretty costly. Better store the gravatar email hash somewhere.

macbirdie
10x! I was getting the warning before I introduced the Md5 though.
The md5 code is all native, so I wouldn't expect the overhead of a single md5 sum over a short string to be significant.
Nick Johnson
It's still a bit unusual for a web application to run a cryptographic hash function 100 times per request. But you're right, it's not that computationally intensive.
macbirdie
+4  A: 

The high CPU usage will be due to fetching 100 entities per request. You have several options here:

  • Using Profile.all().fetch(100) will be ever so slightly faster, and easier to read besides.
  • Remove any extraneous properties from the Profile model. There's significant per-property overhead deserializing entities.
  • Display fewer users per page.
  • Store the output of this page in memcache, and render from memcache whenever you can. That way, you don't need to generate the page often, so it doesn't matter so much if it's high CPU.
Nick Johnson
Thanks a lot. I'll try these optimizations right away!
A: 

It depends where you get the warning of too much CPU.

Is it in the dashboard, it probably is a lot of datastore CPU, no need for optimization.

If the request takes more then 10 sec you need to optimize.

If you get regular Log warnings that a certain request is x.xx over CPU limit it means your application code is taking too long. And needs optimization.

I have found that a lot of Django template stuff does not take a lot of application CPU (50-100 Mcycle). If all the fields for the template are precomputed.

marioddd
A: 

I had an issue with a lot of CPU being used for seemingly little work, which turned out ot be queries running multiple times. Eg. In my Django template, I did post.comments.count and then looped through post.comments. This resulted in two executions - one getting the count, and one getting the entities. Oops!

I'd also say grab a copy of Guido's Appstats. It won't help with the Python, but it's very useful to see the time spent in API calls (and the time between them - which often gives an indication of where you've got slow Python).

You can get the library here: https://sites.google.com/site/appengineappstats/

I wrote an article about it on my blog (with some screenshots): http://blog.dantup.com/2010/01/profiling-google-app-engine-with-appstats

Appstats

Danny Tuppeny