tags:

views:

119

answers:

3

I'm building a mobile photo sharing site in Python similar to TwitPic and have been exploring various queues to handle the image processing. I've looked into RabbitMQ and ActiveMQ but I'm thinking that there is a better solution for my use case. I'm looking for something a little more lightweight. I'm open to any suggestions.

+1  A: 

You could write a daemon that uses python's built-in multiprocessing library and its Queue.

All you should have to do is set up a pool of workers, and have them wait on jobs from the Queue. Your main process can dump new jobs into the Queue, and you're good to go.

Fragsworth
i agree, just make sure that the task sorting code won't be killed by an exception and stops the whole process
Szundi
Yeah, wrap everything in a try: except: block and **log** any exceptions that do occur, but allow the process to continue.
Fragsworth
He's talking about distributed message queues.
Newton Falls
Newton Falls: Python's multiprocessing queue **is** a distributed message queue. It just doesn't necessarily use the AMQP standard.He also asked for a more "lightweight" solution, and I see no reason why image processing can't be done this way.
Fragsworth
You're right. My bad.
Newton Falls
A: 

Are you considering single machine architecture, or a cluster of machines? Forwarding the image to an available worker process on the same machine or a different machine isn't profoundly different, particularly if you use TCP sockets. Knowing what workers are available, spawning more if necessary and the resources are available, having a fail-safe mechanism if a worker crashes, etc, gradually make the problem more complicated.

It could be something as simple as using httplib to push the image to a private server running Apache or twisted and a collection of cgi applications. When you add another server, round robin the request amongst them.

Joel
A: 

Gearman is good in that it optionally allows you to synchronize multiple jobs executed on multiple workers.

I've used beanstalkd successfully in a few high-volume applications.

The latter is better-suited to async jobs, and the former gives you more flexibility when you'd like to block on job execution.

Dustin