tags:

views:

166

answers:

4

I have a computer that is running a single program that manages up to 48 individual processes on 4 other computers. I have the WCF services (one for each process) set up as such:

    public void StartService(Uri uri, string identifier)
    {
        unitMetaData = identifier;
        var binding = new WSDualHttpBinding(WSDualHttpSecurityMode.None);
        binding.ReliableSession.InactivityTimeout = TimeSpan.FromDays(20);
        var reader = binding.ReaderQuotas as XmlDictionaryReaderQuotas;
        reader.MaxStringContentLength = WCFContentSize; // 16777216
        service = new ServiceHost(this, uri);
        service.Faulted += TestService_Faulted;
        service.AddServiceEndpoint(
            typeof(IController),
            binding,
            identifier);
        service.Open();
    }

Here is the code for the remote processes:

    public void Connect()
    {
        // External binding used to change the WCF XML text content size
        var binding = new WSDualHttpBinding(WSDualHttpSecurityMode.None);
        binding.ReliableSession.InactivityTimeout = TimeSpan.FromDays(20);
        var reader = binding.ReaderQuotas as XmlDictionaryReaderQuotas;
        reader.MaxStringContentLength = WCFContentSize; // 16777216
        DuplexChannelFactory<IController> factory = new DuplexChannelFactory<IController>(new InstanceContext(this), binding);
        controllerChannel = factory.CreateChannel(new EndpointAddress(controllerAddress, new DnsEndpointIdentity(controllerAddress.DnsSafeHost), new System.ServiceModel.Channels.AddressHeaderCollection()));
        ((IClientChannel)controllerChannel).OperationTimeout = TimeSpan.FromSeconds(ChannelOperationTimeoutInSeconds); // 300
        controllerChannel.RequestTestData();
    }

I have some code that will call a remote "Ping()" function that simply returns the string "Pong" about every 30 seconds on each remote process. I did this to ensure that the connection stays open as I had some issue with the ReliableSession timing out. Occasionally (as in much too often for production code) I get the following exception from one and usually more services that test processes are connecting to:

An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:
System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServerReliableDuplexSessionChannel, cannot be used for communication because it is in the Faulted state.

Server stack trace: 
   at System.ServiceModel.Channels.TransmissionStrategy.WaitQueueAdder.Wait(TimeSpan timeout)
   at System.ServiceModel.Channels.TransmissionStrategy.InternalAdd(Message message, Boolean isLast, TimeSpan timeout, Object state, MessageAttemptInfo& attemptInfo)
   at System.ServiceModel.Channels.ReliableOutputConnection.InternalAddMessage(Message message, TimeSpan timeout, Object state, Boolean isLast)
   at System.ServiceModel.Channels.ReliableDuplexSessionChannel.OnSend(Message message, TimeSpan timeout)
   at System.ServiceModel.Channels.DuplexChannel.Send(Message message, TimeSpan timeout)
   at System.ServiceModel.Dispatcher.DuplexChannelBinder.Request(Message message, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
   at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

Exception rethrown at [0]: 
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at SEL.MfgTestDev.ESS.ServiceContracts.ITestProcessClient.Ping()
   at SEL.MfgTestDev.ESS.Testing.Service.TestService.Ping() in C:\Projects\Mfg_TestDev_ESS_Rev3\branches\MSU-5-18-2010\ESS.Testing.Service\TestService.cs:line 349

So what's going on? Why is it suddenly ending up in a faulted state. Is there a way I can get the reason why a connection has faulted?

A: 

Assuming that you're using the same channel for pinging the remote service as other remote calls (which was the whole point of this ping right?) it could be that one of the other method calls excepted/timed out and faulted your channel?

Also, in your configuration for ServiceBehaviors, is 'includeExceptionDetailInFaults' set to true? e.g.

<behaviors>
   <behavior name="MyServiceBehaviors">
      <serviceDebug includeExceptionDetailInFaults="true" />
   </behavior>
</behaviors>

During debug this is useful as it allows you to see the exception message from the server but the downside is that if faults your channel too, so in a production environment it's best to leave it off.

theburningmonk
This doesn't really address the question, simply provides a debugging suggestion.
Firoso
I think the includeExceptionDetailInFaults is not recommended for production only because it might expose internal details about your system i.e. it is security related-
Stefan Egli
+3  A: 

Not a good idea for production environment but you can try to turn on WCF tracing on both server and clients. You will hopefully find better error description.

Btw. you had problems with reliable session because it timed out after 10 minutes of inactivity. You set up inactivity timeout for reliable session but there is also recieve timeout on binding which is by default 10 minutes. If no message arrives in 10 minutes application session is closed = service instance is destroyed and reliable session is closed as well.

Edit:

The problem description is insufficient. Also architecture is very strange. There is not one service communicating with 48 clients over duplex channels but 48 same services communication with one 1 client over duplex channels. This can of course add additional problems which are not known from common scenarios so diagnostics (tracing / performance counters) is realy needed!

When checking the code of Connect method it even looks like client callback is singleton communicating with all 48 services, isn't it? What concurrency mode is used on that callback? If concurrency mode is single there can be timeout problems when calling the callback because message size is set to 16MB. If all 48 processes sends 16MB message in the same time they will be queued and processed in FIFO order. Default settings demands processing within 30s otherwise timeout exception occures and channel is faulted. If the concurrency mode is multiply there still can be some synchronization problems inside callback implementation.

Ladislav Mrnka
I Agree, WCF exposes very little information when it comes to debugging. As far as I know, you'll get this exception if you re-use a proxy client or channel after a fault as been thrown. Make sure to use SvcTraceViewer.exe too. Good luck.
Maxime
There are 48 services running within a single application on the host PC. There are (up to) 48 separate remote processes running on 4 other PCs. I am currently using ConcurrencyMode as single for both ends of the communication. The part about needing to be processed in 30s fits in with what appears to be happening.
MGSoto
The 16MB was the problem, we were completely overloading the network and systems at one point during our tests. Thanks for helping us (me and Firoso) figure out the issue, we have since reduced it to 128KB, and there have been no further issues.
MGSoto
A: 

I can help you find out why the connection is faulted:

There is an issue with WCF where if the channel is faulted, calling Dispose() results in a CommunicationObjectFaultedException (which is what your stack trace shows). After a Channel enters the faulted state Abort() should be called rather than Close(). Unfortunately a WCF client’s Dispose() calls Close(). This means that how you would normally handle a disposable object in C# (like wrapping them in a using statement) is a bug. Here’s Microsoft's official description of the issue.

Your code isn’t showing how you are doing resource clean up, but I’ll bet this is your issue. So in the short term: Catch and log your exception before any WCF resource is disposed (remember that disposing your ChannelFactory automatically disposes all of the channels it created). This will give you a meaningful error message.

In the long term I personally suggest a WCF client wrapper that implements Dispose() correctly.

ErnieL
Do you see any using statement in mentioned code? When dealing with Duplex communication proxy is long living and using is not used. But yes you are right that CommunicationObjectFaultedException is fired when you want to call anything except Abort on faulted channel.
Ladislav Mrnka
+1  A: 

Your channel can be in faulted state if you do not wrap service exceptions in to FaultException or FaultException<T>:

http://blogs.msdn.com/b/pedram/archive/2008/01/25/wcf-error-handling-and-some-best-practices.aspx

I assume what that some other service call throws an exception, the channel is faulted and then you get the exception that you describe, when you attempt to ping the service.

Stefan Egli