views:

24

answers:

1

I have a rather specific issue with a WCF host in Azure. Please bear with me as I describe the situation.

We have a WCF host hosted in and Azure worker role using an net TCP binding. We have two instances of this worker role running to provide redundancy. For reasons that are irrelevant to our problem, we force a restart to these instances by changing the config settings every hour. Thanks to the upgrade domains, one instance restarts before the second instance meaning we always have at least one instance running.

Our client code (also running on Azure, but I don't think it would matter where it was) looks very similar to this (function names changed to exaggerate the point):

public BrowseResults Browse(BrowseParameters parameters)
{
    using (Proxy client = CreateProxyWithBindingsAndEndPoints())
    {
        return client.Browse(parameters);
    }
}

private Proxy CreateProxyWithBindingsAndEndPoints()
{
    var binding = new NetTcpBinding(SecurityMode.Transport);

    binding.Security.Transport.ClientCredentialType = TcpClientCredentialType.Certificate;
    binding.Security.Transport.ProtectionLevel = ProtectionLevel.EncryptAndSign;

    var epAddress = new EndpointAddress(
        new Uri("http://myapp.cloudapp.net:1000/myservice"),
        new DnsEndpointIdentity("my identity"),
        new AddressHeaderCollection());

    var client = new Proxy(binding, epAddress);

    client.ClientCredentials.ClientCertificate.Certificate = GetClientCertificate();

    return client;
}

My expectation from this is that we are creating a new Proxy, with a new channel and a new connection every time we call this Browse function.

Our problem occurs when one of the instances is restarted we get System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state errors. Now we only get one of these errors for each of the hosts that restarts, but it's still an error we won't to do without.

My current working hypothesis is that somewhere under the hood the WCF client is holding open a connection to the instance that is no longer there, despite the fact that everything I've read says that it shouldn't be.

Is there anything I can do to avoid this problem other than just catching this particular error and retrying? Are there any patterns for retrying client calls? If I do retry how can I ensure that this dodgy connection really has been done away with? My attempts at retries so far haven't been very successful.

A: 

After quite a bit of investigation the problem appears to be not with the client, but with the server. The worker role was starting the WCF host in the OnRun. The problem is that by the time the worker role gets to the OnRun event, it has already signalled to the load balancer that it's ready to receive network traffic. Seeing as the host hadn't actually started yet, it wasn't really ready.

The solution was to move the code which starts the WCF host to the OnStart method.

We also created some pretty nice WCF client retry code. That now we don't seem to need.

knightpfhor