When to shut down message processing in case of queue / database failure?

From what I understand you're looking for following:

Do not loose messages from inbound queue just because application cannot process them.
When to stop processing if errors occur during processing.

First of all it's important to analyze the infrastructure you're dealing with and the kind of situations you'll have to deal with. Typical down times and how often they occur in various tiers of the system. How reliable is the network, is you db a rac server etc.

JMS already provides for mechanisms of retry. If message processing fails, send it back to queue until retires expire. This makes sense only if coupled with a delay so that flooding doesn't occur. If a small delay will not affect the transaction, I would recommend using messages with a delay. depending on your JMS provider, this is supported custom to container. Using a dead letter or exception queue when message from inbound queue cannot be processed can help with loss of messages.
Again, you can be the best judge of situation. You can define a property as how many many consecutive sends to dead letter queue constitute a shut down condition. You can tweak it during your system test to avoid false positives.

As cracked_all also mentioned, it is not recommended to give up immediately.

I think the best way would be to have other databases ready to function as the primary fails. Upon receiving unsuccessful acknowledgment, you can route them to the secondary one. Therefore, you don't lose that much of data. For this case, you can use "Guaranteed Delivery" feature in JMS.

With Guaranteed Delivery, the messaging system uses a built-in data store to persist messages. Each computer the messaging system is installed on has its own data store so that the messages can be stored locally. When the sender sends a message, the send operation does not complete successfully until the message is safely stored in the sender’s data store. Subsequently, the message is not deleted from one data store until it is successfully forwarded to and stored in the next data store. In this way, once the sender successfully sends the message, it is always stored on disk on at least one computer until is successfully delivered to and acknowledged by the receiver.1

Thank you for answering! I'm not worried about point 1 - the insert into the DB and acknowledgement of the message are already transactional in my setup (and duplicate submission detection is there as well). I was really looking to gather opinions on how to handle point 2, i.e. what other people have done.

xcut 2010-02-14 18:46:44

Dead letter queue is very interesting to me, and I would like to get more information about it. Cracked_all, how are your approaches to tweak this setting? Can you please elaborate more you that?

paradisonoir 2010-02-15 19:21:19

@xcut In the current production system I'm working on, we have a MaxDeliveryCnt of 10 and a delay of 10 seconds. TO give a perspective, this is a 24x7 system spread across the globe with 3-tier structure handling between 2K-3K unique transactions daily.

cracked_all 2010-02-16 13:24:31

@paradisonoir I'm not sure what is that you need. Depending on your app server, the implementation details may vary widely. Currently I use another JMS queue to which the messages are redirected if they cannot be processed (after retry count is reached). How the values for retry counts and delay are reached is again dependent on your current system architecture both s/w and h/w. More details on what you're looking for will help?

cracked_all 2010-02-16 13:25:57

@cracked_all Thanks for the explaining the scenario. Though I was really interested in your methods or approaches to set the value of retry or delays? Do run some benchmarking analysis and tweak that accordingly? And my second question would be what your strategy is when your redirected Queue fails as well? Do you have some kind of backup plan? Thanks

paradisonoir 2010-02-16 16:25:30

@cracked_all thanks, that's useful; I guess I will have to make my own calculations, and it will depend on scenarios. Some of them will involve thousands of messages per second, in those cases I guess failing quickly might be the only option.

xcut 2010-02-17 20:12:26

@paradisonoir I don't have a comment on the question you ask about setting the retry values/delays (you can configure those in a queue specific way); what I can say is that the pattern discussed doesn't get around queue failure, only database failure. The retry queue is a transparent JMS queue that is usually handled by the same broker. So if the main queue fails, so will the retry queue. You could make your behaviour application specific an switch to a completely different queue I suppose.

xcut 2010-02-17 20:14:31

@ paradisonoir: If retry queue fails, it's the queue that should go down, not the application. Which should alert queue support folks to look into why that queue is failing and resolve the issue.

Gladwin Burboz 2010-02-18 04:24:01

@paradisonoir No we didn't run any benchmarks, but have based it on two factors: 1. Since the transactions are supposed to be instantaneous, what is an acceptable delay for users before they notice a lag. 2. How many retries can generally ensure that message is not rejected due to spike in network/cpu/hdd usage. If processing messages from retry queue fails, then we log them to a text file (and even if that's not available then the you have more serious problems than worry abt messages).

cracked_all 2010-02-18 05:58:16

@xcut I would agree, if the volume per second is so high, failing quickly may be the best option.However, keep in mind that sudden network spikes may cause your queue to shutdown for no reason. You may want to take that into account before you decide on shutdown counter. On second thoughts, inserting a deferred queue in between main and dead letter queue may give you a change to process transactions with a delay if they cannot be processed immediately.

cracked_all 2010-02-18 06:02:04

@xcut: To handle spikes, you need to constantly monitor your application load and make sure by load testing that your application can handle 2-3 times more capacity than max load you have received in production.

Gladwin Burboz 2010-02-21 13:16:06

Thank you for the answer, one vote up. I'm really interested in hearing about different policies people have used. I have no problem with persistent delivery, etc (the messaging system in the question is already transaction, and uses JTA for outbound messaging that also requires DB updated).

xcut 2010-02-14 18:44:54

In that case, I think you're in pretty good shape.

paradisonoir 2010-02-15 19:18:10

ansaurus

tags:

views:

answers:

When to shut down message processing in case of queue / database failure?

related questions