views:

784

answers:

4

I have written a Google App Engine application that programatically generates a bunch of HTML code that is really the same output for each user who logs into my system, and I know that this is going to be in-efficient when the code goes into production. So, I am trying to figure out the best way to cache the generated pages.

The most probable option is to generate the pages and write them into the database, and then check the time of the database put operation for a given page against the time that the code was last updated. Then, if the code is newer than the last put to the database (for a particular HTML request), new HTML will be generated and served, and cached to the database. If the code is older than the last put to the database, then I will just get the HTML direct from the database and serve it (therefore avoiding all the CPU wastage of generating the HTML). I am not only looking to minimize load times, but to minimize CPU usage.

However, one issue that I am having is that I can't figure out how to programatically check when the version of code uploaded to the app engine was updated.

I am open to any suggestions on this approach, or other approaches for caching generated html.

Note that while memcache could help in this situation, I believe that it is not the final solution since I really only need to re-generate html when the code is updated (as opposed to every time the memcache expires).

Kind Regards, and thank you in advance for any suggestions you may be able to offer. -Alex

A: 

If i understand the question correctly can't you save the following in database:

  • Cached HTML page
  • Time of last page cache
  • Time of last update that will change page

Then in your routines that will change the page store the 'Time of last update that will change page' as time.now or whatever.

Then when user loads the page, make sure 'time of last page cache' is younger than 'time of last update that will change page' and give the cached version. Else re render the page and update the cache.

tm1rbrt
Yes, that is what I would like to do .. however, it is the third part "Time of last update that will change page" that I am having trouble programatically generating. This depends on when the code was *last uploaded* to the App Engine, which I would like to be able automatically extract if possible. If not possible, then I suppose that I can probably manually set some kind of flag in the database using the admin console, and re-generate HTML if the stored HTML code is older than the flag ..
Alexander
erm i cant see any code in the docs to read that kinda info from the script files. You could set up a macro in your text editor that increases a version number const variable in the file and check that against the database but i think thats getting a bit silly. Maybe just set up a script to like /updatedFiles that YOU can run whenever you update any script files?
tm1rbrt
+3  A: 

In order of speed:

  1. memcache
  2. cached HTML in data store
  3. full page generation

Your caching solution should take this into account. Essentially, I would probably recommend using memcache anyways. It will be faster than accessing the data store in most cases and when you're generating a large block of HTML, one of the main benefits of caching is that you potentially didn't have to incur the I/O penalty of accessing the data store. If you cache using the data store, you still have the I/O penalty. The difference between regenerating everything and pulling from cached html in the data store is likely to be fairly small unless you have a very complex page. It's probably better to get a bunch of very fast cache hits off memcache and do a full regenerate every once in a while than to make a call out to the data store every time. There's nothing stopping you from invalidating the cached HTML in memcache when you update, and if your traffic is high enough to warrant it, you can always do a multi-level caching system.

However, my main concern is that this is premature optimization. If you don't have the traffic yet, keep caching to a minimum. App Engine provides a set of really convenient performance analysis tools, and you should be using those to identify bottlenecks after you've got at least a few QPS of traffic.

Anytime you're doing performance optimization, measure first! A lot of performance "optimizations" turn out to either be slower than the original, exactly the same, or they have negative user experience characteristics (like stale data). Don't optimize until you're certain you have to.

Bob Aman
Hi Bob,Thanks for your feedback! Will take into account your suggestions!
Alexander
definitely benchmark memcached if you want to put largeish amounts of data in it. on app engine it pickles everything with a pure python pickle implementation iiirc, this *could* end up pretty slow.
tosh
If the object he's inserting is raw pre-rendered HTML, that shouldn't matter.
Bob Aman
+2  A: 

A while ago I wrote a series of blog posts about writing a blogging system on App Engine. You may find the post on static generation of HTML pages of particular interest.

Nick Johnson
A: 

This is not a complete solution, but might offer some interesting option for caching.

Google Appengine Frontend Caching allows you a way of caching without using memcache.

Albert