views:

209

answers:

4

I'm fairly new to Google App Engine and Python, but I did just release my first real-world site with it. But now I'm getting problems with one path that is using significantly more CPU (and API CPU) time than the other paths. I've narrowed it down to a single datastore fetch that's causing the problem: Carvings.all().fetch(1000)

Under the App Engine dashboard it's reporting "1040cpu_ms 846api_cpu_ms" pretty reliably for each request to that path. It has seemed like this may be the source to some unresponsiveness that my client has experienced with the site in general.

So I can't figure out what is so expensive about this query. Here is the related data model:

class Carving(db.Model):
 title = db.StringProperty(required=True)
 reference_number = db.StringProperty()
 main_category = db.StringProperty()
 sub_category = db.StringProperty()
 image = db.ReferenceProperty(CarvingImage)
 description = db.TextProperty()
 price = db.FloatProperty()
 size = db.StringProperty()
 material = db.StringProperty()
 added_at = db.DateTimeProperty(auto_now_add=True)
 modified_at = db.DateTimeProperty(auto_now=True)

In other places in the app when I pull this model from the datastore I do more filtering and I guess that's why they aren't causing any troubles. But the total number of entities for this model is just above 90 and I just can't imagine why this is so expensive.

A: 

Sometimes you'll get better performance if you do an indexed query, rather than a query of "all" elements in the model.

Also, consider using memcache.

dmazzoni
MEMCACHE! MEMCACHE! MEMCACHE! Facebook has 12,000+ memcached servers.
Sneakyness
I assume by indexed query you mean using some conditions? If so, then I was doing that before but simplified the query in hopes of ruling out possibilities. But the performance did decrease when I removed the index.
donut
Who is downvoting me? App Engine is optimized for indexed queries, and "all" queries, while often fast, are sometimes slower. It's a quirk of the design.
dmazzoni
+2  A: 
  • Memcache, if you haven't already, and especially if the same carvings are going to be fetched again and again. If you only have 90 total, I would imagine they would all be in the cache pretty quickly, and then you should be golden.

  • Do you need all the properties of the Carvings? For example, if you're just displaying a list of carvings, you could have a separate Entity that was something like CarvingSummary that only had a few properties. This would mean your schema was denormalized, but sometimes that's the price you pay for speed.

Also, I'm assuming this is not the first page the user will always hit? If that were the case it could be the cloud spinning up a a new instance.

Peter Recore
Memcache worked magnificently. Though, I'm still surprised that a query that only returns ~90 items took so much CPU time. Thanks!
donut
I agree. I'm not super familiar with the python side of things, and am wondering about the ReferenceProperty. Are you sure that the image in your ReferenceProperty isn't being pulled in somehow?
Peter Recore
A: 

Do you actually need 1000 entities? CPU time goes up more or less linearly with the number of results retrieved, so if you don't actually need all the results, you may be wasting a lot of time fetching and decoding them.

Nick Johnson
There are only 90 entities of that kind in the datastore total, so I don't think it's bringing back 1000.
Peter Recore
It's not bringing back 1000, but in the case that there are a thousand I do need to pull them all. But yes, in that case I would need to work out some pagination. But it's unlikely that the site would ever reach that many.
donut
Also, I tried changing the fetch down to 100 (since I only have about 90 total) and the performance was the same as trying to fetch 1000.
donut
A: 

It could be the image (and/or Text property) that is taking time to load & marshall into objects, depending on how big those properties are.

First prize: just use the memcache as others say. Then the overhead is incurred only on the first hit.

Second prize: I'm not sure how often your images are being changed and how many you might have, but you could consider uploading them as static files and simply linking to them in your HTML. Then it'd be just an HTTP GET from the browser - much lower overhead.

Richard Watson