ansaurus

Question

WCF MSMQ - How do I handle message failure

Answer 1

+5 A:

I think with MSMQ (avaiable only on Vista) you might be able to to do like this:

<bindings>
    <netMsmqBinding>
        <binding name="PosionMessageHandling"
          receiveRetryCount="3"
          retryCycleDelay="00:05:00"
          maxRetryCycles="3"
          receiveErrorHandling="Move" />
    </netMsmqBinding>
</bindings>

WCF will immediately retry for ReceiveRetryCount times after the first call failure. After the batch has failed the message is moved to the retry queue. After a delay of RetryCycleDelay minute, the message moved from the retry queue to the endpoint queue and the batch is retried. This will be repeated MaxRetryCycle time. If all that fails the message is handled according to receiveErrorHandling which can be move (to poison queue), reject, drop or fault

By the way a good text about WCF and MSMQ is the chapther 9 of Progammig WCF book from Juval Lowy

aogan 2008-09-17 12:29:41

Answer 2

+1 A:

Unfortunately I'm stuck on Windows XP and Windows Server 2003 so that isn't an option for me. - (I will re-clarify that in my question as I found this solution after posting and realised i couldn't use it)

I found that one solution was to setup a custom handler which would move my message onto another queue or poison queue and restart my service. This seemed crazy to me. Imagine my Sql Server was down how often the service would be restarted.

SO what I've ended up doing is allowing the Line to fault and leave messages on the queue. I also log a fatal message to my system logging service that this has happened. Once our issue is resolved, I restart the service and all the messages start getting processed again.

I realised re-processing this message or any other will all fail, so why the need to move this message and the others to another queue. I may as well stop my service, and start it again when all is operating as expected.

aogan, you had the perfect answer for MSMQ 4.0, but unfortunately not for me

WebDude 2008-09-17 12:43:24

Answer 3

+2 A:

If you're using SQL-Server then you should use a distributed transaction, since both MSMQ and SQL-Server support it. What happens is you wrap your database write in a TransactionScope block and call scope.Complete() only if it succeeds. If it fails, then when your WCF method returns the message will be placed back into the queue to be tried again. Here's a trimmed version of code I use:

    [OperationBehavior(TransactionScopeRequired=true, TransactionAutoComplete=true)]
    public void InsertRecord(RecordType record)
    {
        try
        {
            using (TransactionScope scope = new TransactionScope(TransactionScopeOption.Required))
            {
                SqlConnection InsertConnection = new SqlConnection(ConnectionString);
                InsertConnection.Open();

                // Insert statements go here

                InsertConnection.Close();

                // Vote to commit the transaction if there were no failures
                scope.Complete();
            }
        }
        catch (Exception ex)
        {
            logger.WarnException(string.Format("Distributed transaction failure for {0}", 
                Transaction.Current.TransactionInformation.DistributedIdentifier.ToString()),
                ex);
        }
     }

I test this by queueing up a large but known number of records, let WCF start lots of threads to handle many of them simultaneously (reaches 16 threads--16 messages off the queue at once), then kill the process in the middle of operations. When the program is restarted the messages are read back from the queue and processed again as if nothing happened, and at the conclusion of the test the database is consistent and has no missing records.

The Distributed Transaction Manager has an ambient presence, and when you create a new instance of TransactionScope it automatically searches for the current transaction within the scope of the method invokation--which should have been created already by WCF when it popped the message off the queue and invoked your method.

C. Lawrence Wenham 2008-09-17 12:54:14

Hi Chris. I'm sure by attributing your operation behavior to TransactionScopeRequired=true negates the need to wrap your Sql Calls in a Transaction Scope as this is already being done.That being said, I'm not sure how your answer relates to my question of MSMQ.

WebDude 2008-09-17 13:23:30

Answer 4

+3 A:

There's a sample in the SDK that might be useful in your case. Basically, what it does is attach an IErrorHandler implementation to your service that will catch the error when WCF declares the message to be "poison" (i.e. when all configured retries have been exhausted). What the sample does is move the message to another queue and then restart the ServiceHost associated with the message (since it will have faulted when the poison message was found).

It's not a very pretty sample, but it can be useful. There are a couple of limitations, though:

1- If you have multiple endpoints associated with your service (i.e. exposed through several queues), there's no way to know which queue the poison message arrived in. If you only have a single queue, this won't be a problem. I haven't seen any official workaround for this, but I've experimented with one possible alternative which I've documented here: http://winterdom.com/weblog/2008/05/27/NetMSMQAndPoisonMessages.aspx

2- Once the problem message is moved to another queue, it becomes your responsibility, so it's up to you to move it back to the processing queue once the timeout is done (or attach a new service to that queue to handle it).

To be honest, in either case, you're looking at some "manual" work here that WCF just doesn't cover on it's own.

I've been recently working on a different project where I have a requirement to explicitly control how often retries happen, and my current solution was to create a set of retry queues and manually move messages between the retry queues and the main processing queue based on a set of timers and some heuristics, just using the raw System.Messaging stuff to handle the MSMQ queues. It seems to work pretty nicely, though there are a couple of gotchas if you go this way.

tomasr 2008-09-17 17:36:23

ansaurus

tags:

views:

answers:

WCF MSMQ - How do I handle message failure

related questions