views:

121

answers:

2

Working with one of our partners, we have developed now two separate sets of web services for their use. The first one was a simple "post to an https URL" style web service, which we facilitated by building a web page in ASP.NET that inspected the arguments in the URL, and then acted accordingly. This "web service" (if you can call it that) has been very stable.

At some point, the partner asked us to begin using SOAP based web services. At their request, we built them a new set of web services largely based on the previous objects, reimplemented as an actual "Web Service". This web service has not been very stable: around once a week, Nagios will alert us that our web service is not responding - and a quick iisreset does the trick.

Analyzing the log output and working in a debugger has not led us to anything concrete. The volume on this new web service is actually much lower than the HTTP web service. I think this could be a code problem or a platform problem, or of course something in between.

We've tried, with little improvement:

  • To duplicate the behavior in the lab
  • Debugging in the Visual Studio debugger
  • Tinkering with IIS options to give it its own application pool

My question, what are the next steps for troubleshooting?

Environment: Windows Server 2003 Standard Edition R2 Service Pack 2 32 bit, Visual Studio 2005, MS SQL 2005, .NET Framework 2.0.50727

+1  A: 

You may get some answers by profiling your webservices and understanding how they are using their resources. perfmon and procmon are both very useful tools in this regard.

EDIT: Since you say errors happen after about a week, the only thing I can think of is resource usage. Ensure your DB connections are being cleaned up, and any opened files (system call to the exe) are being closed.

Also, if your webservices can tolerate it, IIS has a setting that triggers a periodic recycle of an App Pool to handle cases where performance degrades over time. Its dirty, but it may work well for your case.

Nader Shirazie
Hmm, thanks for the tip on the app pool recycle that may be just the trick. We're usually good about freeing up DB connections, but will check that again. Same with the system call. Thanks for the direction.
Kyle Hodgson
Our app pool, apparently by default, was set to recycle every 29 hours. I've changed that to "daily at 4 AM". While in there we noticed that the app pool was apparently limited to one process, and we've enabled the "Web Garden" feature to increase this. Which seems like the kind of thing that you shouldn't have to turn on, but may be I misunderstand something.
Kyle Hodgson
29 Hours eh? Nice. That'll explain "random" shutdowns. As for web garden -- I'd only enable that if there's a proven reason to do so. We run a heavily loaded webservice, and haven't seen the need to use it yet...
Nader Shirazie
+1  A: 

Since there isn't much to go on - here's another odd issue we came up against regarding our web services.

When the web service stops responding how is memory utilization? We have experienced issues with memory and memory fragmentation relating to busy web services on a system (there was also other things running causing additional fragmentation). When we re-factored the web services to load from smaller dll's and depend on other libraries (instead of one large library) we were able to resolve the memory fragmentation.

To identify what was occurring we would take a dump from the offending iis worker process where the app pool resided and then reviewed that using WinDbg. http://www.microsoft.com/whdc/devtools/debugging/default.mspx

Additionally we used DebugDiag to take the postmortem dumps. http://www.iis.net/downloads/default.aspx?tabid=34&g=6&i=1286

Hope this provides another direction to look at.

Dan
It does, we happen to have a big code library DLL that this depends on. Thanks I'll check this out.
Kyle Hodgson
Just to add a note: .Net loads the dll which the web service resides into memory to check Code Access Security privileges on it. The web services receive a large amount of traffic and when analyzing the LOH there were as many as 400 copies of some of the dll's when we started running into memory issues. A restart of IIS resolved this but it kicks everyone out of the application
Dan