views:

668

answers:

3

My app needs to do many datastore operations on each request. I'd like to run them in parallel to get better response times.

For datastore updates I'm doing batch puts so they all happen asynchronously which saves many milliseconds. App Engine allows up to 500 entities to be updated in parallel.

But I haven't found a built-in function that allows datastore fetches of different kinds to execute in parallel.

Since App Engine does allow urlfetch calls to run asynchronously, I created a getter URL for each kind which returns the query results as JSON-formatted text. Now my app can do async urlfetch calls to these URLs which could parallelize the datastore fetches.

This technique works well with small numbers of parallel requests, but App Engine throws errors when attempting to run more than 5 or 10 of these urlfetch calls at the same time.

I'm only testing now, so each urlfetch is the identical query; since they work fine in small volumes but start failing with more than a handful of simultaneous requests, I'm thinking it must have something to do with the async urlfetch calls.

My questions are:

  1. Is there a limit to the number of urlfetch.create_rpc() calls that can run asynchronously?
  2. The synchronous urlfecth.fetch() function has a 'deadline' parameter that will allow the function to wait up to 10 seconds for a response before failing. Is there any way to tell urlfetch.create_rpc() how long to wait for a response?
  3. What do the errors shown below mean?
  4. Is there a better server-side technique to run datastore fetches of different kinds in parallel?

    File "/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 501, in get_result return self.__get_result_hook(self) File "/base/python_lib/versions/1/google/appengine/api/urlfetch.py", line 331, in _get_fetch_result raise DownloadError(str(err)) InterruptedError: ('The Wait() request was interrupted by an exception from another callback:', DownloadError('ApplicationError: 5 ',))

A: 

While I am afraid that I can't directly answer any of the questions that you pose, I think that I ought to tell you that all of your research along these lines may not lead to you to a working solution for your problem.

The problem is that datastore writes take much longer than reads, so if you find a way to max out the number of reads that can happen, you're code will very run out of time long before it is able to make corresponding writes to all of the entities that you have read.

I would seriously consider rethinking the design of your datastore classes to reduce the number of reads and writes that needs to happen, as this will quickly become a bottleneck for your application.

Adam Crossland
The writes are not an issue since App Engine allows async batch puts. Even though an individual put takes longer than an individual fetch, by doing the puts in parallel the total time is acceptable. I'm trying to figure out how to do batch fetches of different kinds in parallel to accomplish the same thing with reads.
mb
A: 

Have you considered using TaskQueues to do the work of queuing the requests to be executed later?

If the task returns a 4xx status it will be considered failed and will be retried later - so you could pass the error back up and have the task queue handle retrying the requests until the succeed. Also, with some experimentation with bucket sizes and rates, you can probably have the Task Queue slow down the requests enough that you don't max out the database

There's also a nice wrapper (deferred.defer) which makes things even simpler - you can make a deferred call to (almost) any function in your app.

James Polley
I don't want to defer the requests. I'm trying to do many datastore operations simultaneously and return the results to the user. App Engine provides batch puts to do concurrent writes of different kinds. But I haven't found a way to do concurrent reads of different kinds.
mb
+1  A: 

Since App Engine allows async urlfetch calls but does not allow async datastore gets, I was trying to use urlfetch RPCs to retrieve from the datastore in parallel.

The lack of async datastore gets is an acknowledged issue:

http://code.google.com/p/googleappengine/issues/detail?id=1889

And there's now a third-party tool that allows async queries:

http://code.google.com/p/asynctools/

"asynctools is a library allowing you to execute Google App Engine API calls in parallel. API calls can be mixed together and queued up and then all are kicked off in parallel."

This is exactly what I was looking for.

mb