ansaurus

Question

Google App Engine: how to parallelize downloads using TaskQueue or Async Urlfetch?

Answer 1

A:

Use this: http://code.google.com/appengine/docs/python/urlfetch/asynchronousrequests.html

Which is simple like so:

def handle_result(rpc):
    result = rpc.get_result()
    # ... Do something with result...

# Use a helper function to define the scope of the callback.
def create_callback(rpc):
    return lambda: handle_result(rpc)

rpcs = []
for url in urls:
    rpc = urlfetch.create_rpc()
    rpc.callback = create_callback(rpc)
    urlfetch.make_fetch_call(rpc, url)
    rpcs.append(rpc)

# ...

# Finish all RPCs, and let callbacks process the results.
for rpc in rpcs:
    rpc.wait()

Matt Williamson 2010-08-22 06:05:53

it's fine where it is. just replace the while section with the code above and alter as necessary. no globals needed.

Matt Williamson 2010-08-23 03:20:05

@Matt although not very detailed, your answer helped me to focus on async solution.

systempuntoout 2010-08-23 13:57:01

Answer 2

A:

I have resolved with this:

chunks_dict = {}

def handle_result(rpc, page):
    result = rpc.get_result()
    chunks_dict[page] = result["data"]

def create_callback(rpc, page):
    return lambda: handle_result(rpc, page)

rpcs = []
while True:
    rpc = urlfetch.create_rpc(deadline = 10)
    rpc.callback = create_callback(rpc, page)
    urlfetch.make_fetch_call(rpc, 'http://www.foo.com/getdata?id=xxx&amp;result=JSON&amp;page=%s' % page)
    rpcs.append(rpc)
    if page > total_pages:
       break
    else:
       page = page +1   
for rpc in rpcs:
    rpc.wait()

page_keys = chunks_dict.keys()
page_keys.sort()
for key in page_keys:
    data_list= data_list + chunks_dict[key]

systempuntoout 2010-08-23 13:53:39

ansaurus

tags:

views:

answers:

Google App Engine: how to parallelize downloads using TaskQueue or Async Urlfetch?

related questions