tags:

views:

263

answers:

6

We're getting ready to switch our ASP.Net application to a new web farm environment. However, our testing has revealed an intermittent problem whereby a page takes up to 2 minutes to finish loading, where normally it would take less than 2 seconds. Browser diagnostic tools (like Firebug) show that the delay occurs when the page is loading the jQuery library and our style sheet. I don't really think there is a problem with those files, but I really don't know what the problem is so I can't be sure.

Here's some more information about our environment. Our web servers are running Windows Server 2008 (64-bit), IIS 7, .Net 3.5. We're using a Cisco CSS load balancer configured to send traffic to whichever server currently has the least load (i.e. the load balancer is not using sticky sessions). The web servers are configured to use a third server as the session store (3rd server is running ASP.Net Session State Service).

Any ideas about what could be causing the delay?


UPDATE:

Thanks for the responses thus far. In answer to a few suggestions given, I can definitely say that the problem is not due to initial page load or any heavy 3rd party controls. We can hit the page up to 100 times in a row with no delay, and then all of a sudden the delay occurs on the 101st time we hit it. Also, this may be an important clue... when the delay occurs, I can immediately hit reload on my browser and the page will return to it's speedy load time... would that point more towards a network/DNS issue?


UPDATE 2:

It seems like the error only ever occurs while downloading the jQuery library. I'm sure that most of the time it's picking it up from the local cache, but even if the local copy expired and it downloads a new copy, it shouldn't take 2 minutes to download a minified jQuery library that is only ~56KB in size.


UPDATE 3:

After trying Fiddler (for the first time), I was able to reproduce the problem. This time the delay occurred while downloading an image file from the server. And, IT OCCURRED WHILE RUNNING ON OUR OLD SERVER - NOT THE WEB FARM! Here's what fiddler said about that file. Any ideas on what conclusions to draw from this?

Request Count:  1
Bytes Sent:     753
Bytes Received: 242

ACTUAL PERFORMANCE
ClientConnected:    19:53:15:5921
ClientDoneRequest:  19:53:15:8421
Gateway Determination:  0ms
DNS Lookup:      0ms
TCP/IP Connect:  31ms
ServerGotRequest:   19:55:25:7640
ServerBeginResponse:    19:55:25:7952
ServerDoneResponse: 19:55:25:7952
ClientBeginResponse:    19:55:25:7952
ClientDoneResponse: 19:55:25:8108

    Overall Elapsed: 00:02:10.2187500

RESPONSE CODES
HTTP/304:   1

RESPONSE BYTES (by Content-Type)
~headers:   242

AND the response headers are as follows:

HTTP/1.1 304 Not Modified
Cache-Control: max-age=2592000
Last-Modified: Tue, 04 Aug 2009 05:11:20 GMT
Accept-Ranges: bytes
ETag: "ed5bca0c214ca1:f3a"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Tue, 08 Dec 2009 02:55:37 GMT
A: 

Have you considered serving static files from another webserver configured for that purpose - perhaps at a different domain?

Tom Leys
A: 

I usually use a tool like Fiddler/Charles/HttpWatch (basic- which is free) to resolve this, though firebug is also equally good. How many requests is the page making ? Are you using any 3rd party controls (like telerik) which sometimes can be heavy. May be your application/page is heavy and its the first hit for that page ? And are you caching the static resources at the client side (expires header) ?

ram
+1  A: 

You might use Fiddler and/or WireShark to narrow down the problem.

Possible candidates: DNS problems, a large back-end database query that's filling cache for the first time, a timeout of some kind, an incorrectly configured load balancer, initial compilation of your ASP.NET code on first access (it's done on a per-folder basis by default), network errors -- and the list goes on.

RickNZ
OK. I used Fiddler and was able to reproduce the problem. I posted the results under "Update 3" above.
chief_wampum
A: 

I suggest you do a cold start of the app to see how long it takes to start it in general. You can do this by cycling the app pool supporting the Virtual Directory. What may be happening is the app pool is timing out and being unloaded after 20 minutes of inactivity. If that's the case, the 101st request may be restarting it and there may be warming procedures which are time consuming. If you find this is the case, I suggest you create a simple keep-alive procedure that ensures the app is always in a warm state. You can do this by scheduling a task to browse a URL on your site which have a no-cache meta header.

Nissan Fan
Thanks for the suggestion. Recycling the app results in no perceptible delay at all. On our old server, a recycle could cause up to a 30 second delay, but the new servers are apparently much quicker.
chief_wampum
A: 

One thing well worth ruling out is if the app domain is being recycled, causing the entire application to have to restart again (is this two minutes, roughly the time it takes the whole app to start from cold?).

Several things can cause this including IIS deciding to recycle the app pool according to its app pool settings (memory threshold, recycle time, etc), OR due to an unhandled exception bubbling the whole way up to the top.

The quickest way to detect this IMHO is to either run Sysinternals' Process Explorer on the server, and add the column "Total AppDomains" from the .Net columns tab. Now keep an eye on the relevant asp.net process. If the total app domain count rises every time you experience the two minute delay, then it is due to an app domain recycle.

Rob Levine
Thanks for the suggestion, but I think I can rule that one out too. Currently, the app pool is set to recycle every 1740 minutes (must be the default, because we didn't adjust those settings yet). Initial app startup has never taken 2 minutes... probably up to 30 seconds tops.
chief_wampum
A: 

You said you are in a web farm environment. I would connect to each box of the farm directly, bypassing the load balancer. This way you can test the responsiveness of each box. It may be only one box in the farm is causing the issue or it is the load balancer.

Tony Borf