views:

516

answers:

2

I'd like to create a farm of processes that are able to OCR text. I've thought about using a single queue of messages which is read by multiple OCR processes.

I would like to ensure that:

  • each message in queue is eventually processed
  • the work is more or less equally distributed
  • an image will be parsed only by one OCR process
  • An OCR process won't get multiple messages at once (so that any other free OCR process can handle the message).

Is that possible to do using AMQP?

I'm planning to use python and rabbitmq

+3  A: 

Yes, that's possible. Server cluster for a real-time MMO game I'm working on operate this way. We use ActiveMQ, but I think all this possible with RabbitMQ as well.

All items that you mentioned you get out of the box, except last one.

  • each message in queue is eventually processed - this is one of main responsibilities of message brokers
  • the work is more or less equally distributed - this is another one :)
  • an image will be parsed only by one OCR process - the distinction of /topic and /queue exists for this. Topics are like broadcast signals, queues are tasks. You need a /queue in your scenario

To make last one work in desired way, consumers send AMQ-specific argument when subscribing to the queue:

activemq.prefetchSize: 1

This setting guarantees that consumer will not take any more messages after it took one and until it send an ack to AMQ. I believe something similar exists in RabbitMQ.

nailxx
+3  A: 

Piotr

Sorry to be slow, I only just noticed this. The answer to your question is Yes, as @nailxx points out above. The AMQP programming model is slightly different from JMS in that you only have 'queues', which can be shared between workers, or used 'privately' by a single worker. You can also easily set up RabbitMQ to do 'pubsub' use cases or what in JMS are called 'topics'. Please go to our Getting Started page on the RabbitMQ web site to find a ton of helpful info about this.

Now, for your use case in particular, there are already plenty of tools available. One that people are using a lot, and that is well supported, is Celery. Here is a blog post about it, that I think will help you get started: http://webcookies.org/blog/2009/09/10/rabbitmq-celery-and-django/

If you have any questions please email us or post to the rabbitmq-discuss mailing list.

Cheers

alexis

alexis
+1 Celery looks lovely. I'll check it out at the nearest opportunity.
Piotr Czapla