views:

19

answers:

1

I have a web app that I would like to have the following functionality:

  1. user gives a url
  2. webapp gets json data from that url and others from same website (this can take anywhere from 1-10 seconds)
  3. webapp uses data to generate page for user

With this approach, I believe that if the server is in the process of getting the data for one user, then the other user won't be able to load the page (server busy). I would like to avoid this if possible.

It seems like the Google Tasks API would be useful for this, but I don't see how I can run the task, and then use the output of the task to generate the page (how would the main app know when the task was finished?)

What is the best way to resolve this?

Thanks in advance

+3  A: 

Some ideas:

1) App engine can serve more than one request at a time. Try it out - app engine will probably spin up more than one instance of your app => multiple requests can be done at once. With request times so long though, I wouldn't expect it to scale much though (they recommend request/response time in under 1 second - see this link).

2) If you wanted to return quickly to the user, you could enqueue to the task queue as you suggested. Then have the user's webpage (via meta tag http-equiv or JavaScript perhaps) poll the server every couple seconds to see if the page is ready.

3) If the generated page may be needed again, you should consider memcaching it to try to save the effort of generating it again. With load times of 10 seconds, you might even consider storing them in the datastore for a little while (if caching is appropriate for your app).

Here's a very basic example of how you might do this:

import hashlib

from google.appengine.api.labs import taskqueue
from google.appengine.ext import db, webapp
from google.appengine.ext.webapp.util import run_wsgi_app

class BuiltPage(db.Model):
    html = db.TextProperty()

class PageBuilder(webapp.RequestHandler):
    """Handler called by the task queue to build the page."""
    def post(self):
        key = self.request.get('key')
        url = self.request.get('url')
        import time
        time.sleep(5) # pretend it takes a while to build the page
        html = "you asked for %s" % url # do real stuff to build the page here ...
        BuiltPage(key_name=key, html=html).put() # should check for errors ...

def html_to_redir_to_built_page(hexkey):
    "Page to show while we wait.  Auto-refreshes until we get the built page."""
    new_url = '/get_built_page?k=' + hexkey
    refresh_tag = '<meta http-equiv="refresh" content="2;%s"/>' % new_url
    return '<html><head>%s</head><body>please wait</body></html>' % refresh_tag

class BuildPageForURL(webapp.RequestHandler):
    """Handles requests by a user to build the page for the request URL."""
    def get(self):
        url = self.request.get('url')
        key = hashlib.md5(url).hexdigest()
        # optimization: check datastore to see if it was already generated?
        taskqueue.add(url='/buildit', params=dict(key=key, url=url))
        self.redirect('/get_built_page?k=' + key)

class GetBuiltPage(webapp.RequestHandler):
    """Returns the built page if it is ready, otherwise returns a page which will retry later"""
    def get(self):
        key = self.request.get('k')
        bp = BuiltPage.get_by_key_name(key)
        if bp:
            self.response.out.write(bp.html)
            # maybe cleanup if you know this is a 1-time request: bp.delete()
        else:
            self.response.out.write(html_to_redir_to_built_page(key))

application = webapp.WSGIApplication([('/',               BuildPageForURL),
                                      ('/buildit',        PageBuilder),
                                      ('/get_built_page', GetBuiltPage)])
def main(): run_wsgi_app(application)
if __name__ == '__main__': main()
David Underhill
Thank you very much for the very detailed answer. My only issue is that the `BuiltPage.get_by_key_name` calls in `GetBuiltPage` are going to be costly, but I suppose that can't be avoided.I was thinking about just coding the app entirely in JS and just having the user download the files themselves, but then there wouldn't be any caching. I'd like to have a cache of the content for a day or so, though it isn't at all necessary.Thanks again!
mellort