We have two pieces of architecture. In essence they form a producer and a consumer. Piece 1 (p1) publishes messages to Piece 2 (p2) which processes the message, this process involves sending the message to a remote node, which must ack the message once it has processed it, this process can take a few seconds at best.
p2 has a finite length in its queue and items are not removed until it receives the ack from the remote node. Due to this p2 can return a QUEUE_FULL response to p1. When p1 receives this response it keeps a queue, whenever a new message is produced it adds it to the end of this queue and then cycles through the queue sending messages to p2 until it once again gets a QUEUE_FULL. The problem here is that once p2's queue is empty/has space it isn't able to notify p1 to produce the messages.
For each instance of producer in p2 there is a corresponding producer in p1, this is important when it comes to the potential solutions below.
One solution could be that p2 could be changed to notify p1 when there is space in its queue, however this solution requires a fair amount of network overhead (http) as it is feasible at any one time many thousands of p2 queues need to notify their corresponding p1 producers.
The other solution could be that p1 could be changed to keep attempting to send the message to p2. The problem with this is that a producer in p1 needs to have a thread that sleeps x before trying to send the next message, clearly there could be a singleton that handles this sleep/retry mechanism however the logic here, as producers and consumers increases to many thousands, gets rather complex;
- synchronization on adding, removing, producers
- reading queues, making next read times
- considerations for tight looping when low producer count
- considerations for long waits when high producer count
- .... etc
I'm close to suggesting a MQ tier where p1 publishes to and p2 reads from. However this introduces a new issue where p2 is not able to notify p1 when the remote node goes away, however this could be handled by a http call back from p2 to p1 - the level of overhead here is acceptable as the chance the remote node goes away is low.
Am I missing a design pattern which would remove the need for an MQ (yet another service to worry about, monitor, etc)? Thoughts much appreciated.
Some other details:
- each p1 producer instance is request scoped for the most part
- each p2 consumer is a dedicated running thread