I have a troublesome problem which I'm at a loss to explain. To put it simply, the CPU use is inexplicably high on the web servers in my web farm.
I have a large number of users hitting two front-end web servers. 99% of the page loads are Ajax requests and serve a simple JSON-serialized object which the web servers retrieve from a backend using WCF. In the typical case (again, probably 99% of the requests), all the ASPX page is doing is making a WCF call to get this data, serializing it into a JSON string and returning it.
The object is pretty small-- a guid, a couple short strings, a few ints.
The non-typical case is the initial page load, which does the same thing (WCF request) but injects the response into different parts of the page using asp:literals.
All three machines (2 web servers, one backend) have the same hardward specs. I would expect the backend to do the majority of the work in this situation, since it's managing all the data, doing the lookups, etc. BUT: the load on the backend is much less than the load on the front ends. The backend is a nice, level 10-20% CPU load. The front ends run an average of 30%, but they're all over the map, sometimes hitting spikes of 100% for 10 seconds and taking 600ms to serve these very simple pages.
When I run the front-end in profiler (ANTS), it flags the WCF communication as taking 80% of the CPU time. That's the whole call on the .NET-generated WCF proxy.
WCF Setup: the service is fully parallel. I have instancing set to "single" and concurrency set to "multiple". I opened up the maxConnections and listenBacklog on the service to 256. Under heavy strain (500 requests/s) I see about 75 connections open between both front-end servers and the service, so it's not hitting that wall. I have security set to 'none' all around. Bandwidth use is about 1/20th of the potential (4Mb/s on a 100Mb/s network).
On the client (the web servers), I create a static ChannelFactory for the service. Code to call the service looks like:
service = MyChannelFactory.CreateChannel();
try {
service.Call();
service.Close();
} catch {
service.Abort();
}
(simplified, but you get the basic picture)
What I don't understand is where all this load on the front end is coming from. What's strange about it is that it's never in the 30%-90% range. It's either in panic mode (100%) or doing OK (30% or less). Given the load on the backend, though, I'd expect both of these machines to be 10% or less. Memory use, handles, etc., all seem reasonable.
To add one more wrinkle: when I log how long it takes to service these calls on the backend, I get times consistently less than 15ms (maybe one or two spikes to 30ms every minute). On the front end, these calls can take up to 1s to return. I guess that could be because of the CPU problems, but it seems off to me.
So... does anyone have any ideas on where to look on this kind of thing? I'm running short on things to explore.
Clarification: The WCF service is hosted in a Windows service, and is using a netTcp binding. Also, I have the maxConnections on the client set to 128, FWIW.