views:

44

answers:

1

I have a producer that I want to distribute work consistently across consumers by consistent hashing. For example, with consumer nodes X and Y, tasks A, B, C should always go to consumer X, and D, E, F to consumer Y. But that may shift a little if Z joins the pool of consumers.

I didn't want to deal with writing my own logic to connect to the consumer nodes, and especially not with managing nodes joining and leaving the pool, so I've gone down the path of using RabbitMQ, and an exclusive queue per consumer node.

One problem I'm running into is listing these queues, since the producer needs to know all the available queues before work is distributed. AMQP doesn't even support listing queues, which makes me uncertain of my whole approach. RabbitMQ and Alice (brokenly at the moment) add that functionality though: http://stackoverflow.com/questions/2840290/is-there-an-api-for-listing-queues-and-exchanges-on-rabbitmq

Is this a wise use of Rabbit? Should I be using a message queue at all? Is there a better design so the queue can consistently divide my work among consumers, instead of me needing to do it?

+1  A: 

What you describe is do-able in RabbitMQ.

Your setup would be something like:

  • a producer publishes the message to a topic exchange; let's name it consistent_divider;
  • when a consumer, joins the pool, it connects to the broker and creates an exclusive queue with its name, but doesn't bind it to anything
  • the producer periodically polls the broker (maybe using rabbitmqctl list_consumers) to check if the consumers have changed; if they have, it removes all of the existing bindings and rebinds the queues as needed;
  • when the producer publishes, messages are assigned a routing key that matches their task type.

So, if you have 6 task types: A, B, C, D, E, F, and only two consumers C1 and C2, your bindings would look like: C1 bound 3 times to consistent_divider with routing keys A, B and C; C2 bound 3 times to c_d with routing keys D, E and F.

When C3 joins the pool, the producer sees this and rebinds the queues accordingly.

When the producer publishes, it sends out the messages with routing_keys A, B, C, D, E and/or F, and the messages will get routed to the correct queues.

There would be two potential problems with this:

  1. There's a slight lag between when the consumer joins the pool and messages get routed to it; also, if there are messages already in the queues, it's possible for a consumer to get messages meant for another consumer (e.g. C3 joins, the producer rebinds, but C2 still gets some E and F messages because they were already in its queue),
  2. If a consumer dies for whatever reason, the messages in its queue (and en route to its queue) will be lost; this can be solved by republishing and dead-lettering the messages, respectively.

To answer your last question, you probably want to use queuing and RabbitMQ is a great choice, but your requirements (more precisely the `divide the work consistently' bit) don't quite fit AMQP perfectly.

scvalex
Very thorough, encouraging answer! One question. You say consumers don't bind themselves. Is this any different than consumers binding themselves with a routing key equal to their queue name? Then instead of list_consumers I list_queues directly, no further binding needed.
Bluu
Queues are automatically bound to amq.default (a direct exchange) with their name as binding key. As far as I can tell, this isn't what you want. You're not publishing to consumers, so to say, you're publishing to task types and you want some consumers to handle many tasks. So, I was thinking of a one-task-one-binding mapping and the consumers which deal with multiple tasks have multiple bindings. In addition, since when the bindings change, all (most) of them change, it doesn't seem right for a consumer to handle this, so the producer seems a better suited for the job.
scvalex