views:

152

answers:

1

I wanted to benchmark GAE read performance. Around 10,000 entities are being fetched from data store. These entities contain 3 properties name (around 16 chars), description (around 130 chars) and a time-stamp. Nothing unusually large.

Here's what I see:

On an average it takes around 11 seconds to read 10k entities. Not sure whether this is considered fast, slow or reasonable, but it is not too exciting regardless.

More interesting find is the CPU metering. Performing this read operation 100 times consumes about 3.0 CPU hours. The cost is $0.30.

Given there is no CPU intensive algorithm going on here, doesn't it make GAE's CPU bandwidth quite expensive? (sure, it comes with 24/7 sys-admins in the form of Python scripts etc etc, but still...)

Or is this something in my Java code:

http://github.com/akirekadu/GAE-Evaluation/blob/master/show.jsp

+3  A: 

It's not your code, I believe your result. In our own experiments, we found that retrieval and (especially) storage are very expensive operations in terms of CPU quota.

We noted that:

  • Indexes are expensive. If you're writing more than reading, be stingy with your indexes. Make sure you know about the indexed=False attribute on model properties, and pay close attention to what gets auto-generated into index.yaml.

  • If you're reading more than you're writing, then lots of multi-indexes may make sense. Use memcache where you can. Use entity groups, if they make sense.

  • The app engine API gives you tools that improve efficiency. They matter a lot. If you're writing 100 rows, using a single bulk put() vs. 100 individual put() calls will greatly reduce CPU use.

If your app is going to frequently perform large reads as you've described, you may wish to choose a different solution (e.g., a VPS like Slicehost or Linode) or a different data model. Each app will have different needs, WRT disk, cpu, memory, etc, so I leave the back-of-the-envelope calculations as an exercise for the reader.

HTH!

Max