views:

239

answers:

2

We have two pieces of architecture. In essence they form a producer and a consumer. Piece 1 (p1) publishes messages to Piece 2 (p2) which processes the message, this process involves sending the message to a remote node, which must ack the message once it has processed it, this process can take a few seconds at best.

p2 has a finite length in its queue and items are not removed until it receives the ack from the remote node. Due to this p2 can return a QUEUE_FULL response to p1. When p1 receives this response it keeps a queue, whenever a new message is produced it adds it to the end of this queue and then cycles through the queue sending messages to p2 until it once again gets a QUEUE_FULL. The problem here is that once p2's queue is empty/has space it isn't able to notify p1 to produce the messages.

For each instance of producer in p2 there is a corresponding producer in p1, this is important when it comes to the potential solutions below.

One solution could be that p2 could be changed to notify p1 when there is space in its queue, however this solution requires a fair amount of network overhead (http) as it is feasible at any one time many thousands of p2 queues need to notify their corresponding p1 producers.

The other solution could be that p1 could be changed to keep attempting to send the message to p2. The problem with this is that a producer in p1 needs to have a thread that sleeps x before trying to send the next message, clearly there could be a singleton that handles this sleep/retry mechanism however the logic here, as producers and consumers increases to many thousands, gets rather complex;

  • synchronization on adding, removing, producers
  • reading queues, making next read times
  • considerations for tight looping when low producer count
  • considerations for long waits when high producer count
  • .... etc

I'm close to suggesting a MQ tier where p1 publishes to and p2 reads from. However this introduces a new issue where p2 is not able to notify p1 when the remote node goes away, however this could be handled by a http call back from p2 to p1 - the level of overhead here is acceptable as the chance the remote node goes away is low.

Am I missing a design pattern which would remove the need for an MQ (yet another service to worry about, monitor, etc)? Thoughts much appreciated.

Some other details:

  • each p1 producer instance is request scoped for the most part
  • each p2 consumer is a dedicated running thread
+1  A: 

Review 3 possibility

  • What about open yet another MQ for service commands (instead of http invocations);
  • consider p2 be multithread, where one thread without waits extracts messages from MQ, and places them to another thread for processing;
  • (!) use transactional version of MQ - so p2 could extract messages immediatly and p1 can place it as fast as it can. But if processing fails, queue will be rolled back.
Dewfy
I like this idea. P1 and P2 can subscribe to control message channels on each other, similar to the whole XON/XOFF flow control of yesteryear. When P1 gets empty, it sends a message to P2's control channel saying fire 'er up.
Chris Kaminski
But gods, don't introduce another MQ server or HTTP service. The whole point of using MQ is to get away from that.
Chris Kaminski
@darthcoder, I'm reading @dewfy's first bullet as another MQ channel, not another MQ service.@dewfy - nice idea with the admin channel, simple and I missed it. Thanks!
Mike
+2  A: 

Mike,

It seems like the process has a significant amount of complexity (with the possibility of introducing more) just to avoid using MQ? There may be plenty of reasons NOT to use MQ, cost in my experience, but if you have access to it, use it with wild abandon! :) Its far easier to monitor the new MQ process than it is to write the code to introduce like capabilities.

Ideally, a robust queue would prevent P1 ever really needing to know about P2, or its status.

MQ should also really mitigate the need for P2 to notify P1 that its remote node went down - P1 can continue to happily queue up messages to P2 (depending on message frequency/size/storage limits). If the remote node is down for a significant amount of time, then hopefully it was a planned event and operators can shutdown P1. The administrative channel between P2 and P1 sounds like a nice to have?

It also introduces additional complexity - you know your environment, but it can lead to questions like "why am I not getting messages anymore?" - turns out that a service autonomously shutdown another service. Done right, this is awesome and relieves a support burden for the operators - done wrong, it just adds more support burden. Nobody likes that guy.

Could you also queue at the data tier, where storage for P2 might not be as much of an issue?

Embrace the queue (MQ, MSMQ, Sql Queue)!

Z

Zach Bonham
Hi Zach, I agree with you in principal. The addition of MQ is very nice on many levels. My primary concern about adding another service is primarily to do with processes where I work, so wanted to ensure I've covered all other options before embarking upon this.
Mike
Completely understand being in that position! Good luck!
Zach Bonham