views:

652

answers:

2

I have the nginx upload module handling site uploads, but still need to transfer files (let's say 3-20mb each) to our cdn, and would rather not delegate that to a background job.

What is the best way to do this with tornado without blocking other requests? Can i do this in an async callback?

+2  A: 

You may find it useful in the overall architecture of your site to add a message queuing service such as RabbitMQ.

This would let you complete the upload via the nginx module, then in the tornado handler, post a message containing the uploaded file path and exit. A separate process would be watching for these messages and handle the transfer to your CDN. This type of service would be useful for many other tasks that could be handled offline ( sending emails, etc.. ). As your system grows, this also provides you a mechanism to scale by moving queue processing to separate machines.

I am using an architecture very similar to this. Just make sure to add your message consumer process to supervisord or whatever you are using to manage your processes.

In terms of implementation, if you are on Ubuntu installing RabbitMQ is a simple:

sudo apt-get install rabbitmq-server

On CentOS w/EPEL repositories:

yum install rabbit-server

There are a number of Python bindings to RabbitMQ. Pika is one of them and it happens to be created by an employee of LShift, who is responsible for RabbitMQ.

Below is a bit of sample code from the Pika repo. You can easily imagine how the handle_delivery method would accept a message containing a filepath and push it to your CDN.

import sys
import pika
import asyncore

conn = pika.AsyncoreConnection(pika.ConnectionParameters(
        sys.argv[1] if len(sys.argv) > 1 else '127.0.0.1',
        credentials = pika.PlainCredentials('guest', 'guest')))

print 'Connected to %r' % (conn.server_properties,)

ch = conn.channel()
ch.queue_declare(queue="test", durable=True, exclusive=False, auto_delete=False)

should_quit = False

def handle_delivery(ch, method, header, body):
    print "method=%r" % (method,)
    print "header=%r" % (header,)
    print "  body=%r" % (body,)
    ch.basic_ack(delivery_tag = method.delivery_tag)

    global should_quit
    should_quit = True

tag = ch.basic_consume(handle_delivery, queue = 'test')
while conn.is_alive() and not should_quit:
    asyncore.loop(count = 1)
if conn.is_alive():
    ch.basic_cancel(tag)
    conn.close()

print conn.connection_close
Ryan Cox
if i used an offline/otherwise-delayed job processing queue, i'd definitely opt for something like this -- most likely celeryd+redis. but, i'd prefer to handle the upload in the view as to not have the user refreshing a page showing "please wait while we handle your upload".
Carson
understood. my suggestion above was for the staging -> cdn portion of the process flow. I agree that you would want to handle the upload to staging via nginx and provide the user a status indicator.
Ryan Cox
A: 

advice on the tornado google group points to using an async callback (documented at http://www.tornadoweb.org/documentation#non-blocking-asynchronous-requests) to move the file to the cdn.

the nginx upload module writes the file to disk and then passes parameters describing the upload(s) back to the view. therefore, the file isn't in memory, but the time it takes to read from disk–which would cause the request process to block itself, but not other tornado processes, afaik–is negligible.

that said, anything that doesn't need to be processed online shouldn't be, and should be deferred to a task queue like celeryd or similar.

Carson