views:

5104

answers:

7

All,

I have a WCF web service (let's called service "B") hosted under IIS using a service account (VM, Windows 2003 SP2). The service exposes an endpoint that use WSHttpBinding with the default values except for maxReceivedMessageSize, maxBufferPoolSize, maxBufferSize and some of the time outs that have been increased.

The web service has been load tested using Visual Studio Load Test framework with around 800 concurrent users and successfully passed all tests with no exceptions being thrown. The proxy in the unit test has been created from configuration.

There is a sharepoint application that use the Office Sharepoint Server Search service to call web services "A" and "B". The application will get data from service "A" to create a request that will be sent to service "B". The response coming from service "B" is indexed for search. The proxy is created programmatically using the ChannelFactory.

When service "A" takes less than 10 minutes, the calls to service "B" are successfull. But when service "A" takes more time (~20 minutes) the calls to service "B" throw the following exception:

Exception Message: An unsecured or incorrectly secured fault was received from the other party. See the inner FaultException for the fault code and detail Inner Exception Message: The message could not be processed. This is most likely because the action 'namespace/OperationName' is incorrect or because the message contains an invalid or expired security context token or because there is a mismatch between bindings. The security context token would be invalid if the service aborted the channel due to inactivity. To prevent the service from aborting idle sessions prematurely increase the Receive timeout on the service endpoint's binding.

The binding settings are the same, the time in both client server and web service server are synchronize with the Windows Time service, same time zone.

When i look at the server where web service "B" is hosted i can see the following security errors being logged:

Source: Security

Category: Logon/Logoff

Event ID: 537

User NT AUTHORITY\SYSTEM

Logon Failure:

Reason: An error occurred during logon

Logon Type: 3

Logon Process: Kerberos

Authentication Package: Kerberos

Status code: 0xC000006D

Substatus code: 0xC0000133

After reading some of the blogs online, the Status code means STATUS_LOGON_FAILURE and the substatus code means STATUS_TIME_DIFFERENCE_AT_DC. but i already checked both server and client clocks and they are syncronized.

I also noticed that the security token seems to be cached somewhere in the client server because they have another process that calls the web service "B" using the same service account and successfully gets data the first time is called. Then they start the proccess to update the office sharepoint server search service indexes and it fails. Then if they called the first proccess again it will fail too.

Has anyone experienced this type of problems or have any ideas?

Regards,

--Damian

A: 

Tricky - a few questions that might shed some light:

  • what kind of security (transport vs. message, what kind of credentials?) are you using?
  • if you're using Windows credentials anywhere: are all the parties involved (client, servers A and B) in the same domain, or in domains with a two-way trust relationship?

Marc

marc_s
A: 

i'm using the out of the box WSHttpBinding (i just increased the maxReceivedMessageSize and some of the timeouts), so it must be using Message level security with Windows credentials. The client and servers are all on the same domain.

A: 

Hi.

What I believe is happening here is that your channel is timing out (as you suspect).

If I understand correctly, it is not the calls to service A that are timing out, but rather to service B, before you call your operation.

I'm guessing that you are creating your channel before you call service A, rather than just in time (i.e. before calling service B). You should create the channel (proxy, service client) just before you use it like:

AResponse aResp = null;
BResponse bResp = null;
using (ServiceAProxy proxyA = new ServiceAProxy())
{
   aResp = proxyA.DoServiceAWork();
   using (ServiceBProxy proxyB = new ServiceBProxy())
   {
      bResp = proxyB.DoOtherork(aResp);
   }
}
return bResp;

I believe however, that once you get over that problem (service B timing out), you'll realize that the sharepoint app's proxy (that called service A) will timeout. To solve that, you may wish to change your service model from a request-response, to a publish-subscribe model.

With long-running services, you'll want your sharepoint app to subscribe to service A, and have service A publish its results when it is ready to do so - regardless of how long it takes.

Programming WCF Services (O'Reilly) by Juval Lowey, has a great explanation, and IDesign (Juval's company) published a great set of coding standards for WCF, as well as the code for a great Publish-Subscribe Framework.

Hope this helps, Assaf.

Assaf Stone
A: 

We are already creating the proxy just before calling service B. I'm not very sure why should the service care what i do before calling it.

My first guess is that the security token is issued once for service A and when calling the service B, it re-uses the same token which has already expired.

+2  A: 

10 mins is the default receive timeout. If you have an idled proxy for more than 10mins, the security session of that proxy is aborted by the server. Enable logging and you will see this in the diagnostics log of the server. The error message you reported fits for this behavior. Search your system diagnostic file for "SessionIdleManager". If you find it, the above is your problem.

Give it a whirl and set the establishSecurityContext="false" for the client and the server.

Alex
A: 

Don't call the service operation in a using statement. Instead use a pattern such as...

client = new ServiceClient("Ws<binding>")
try
{
    client.Operation(x,y);
    client.Close();
}
catch ()
{
    client.Abort();
}

I don't understand why this works but I would guess that when the proxy goes out of scope in the using statement, Close isn't called. The service then waits until receiveTimeout (on the binding) has expired and then aborts the connection causing subsequent calls to fail.

MarkB
A: 

I actually triggered this error just now by doing something silly. I have a unit test that modifies the system date in order to test some time-based features. And I guess the apparent time difference between when I created the context and when I called my method (because of the changes to the system date), caused something to expire.

strongopinions