We currently have a Live ASP.NET application (Basically a CMS) running on our IIS7 web-server.
Every once and a while (Talking every few months) it's app pool will go to 100% CPU-usage and stay there until the page times out. We've tried increasing the timeout for the page to 30 minutes in the web.config but it still just stayed at full CPU so I'm presuming it's some form of infinite loop.
It is a massive application, one of the biggest we have, and far too large to blindly search for an issue. The prevailing opinion is that since it's so rare we can just restart the app-pool whenever it happens, but I'd much prefer to fix it.
I have access to the code and full administrator access to the hosting server, and the monitoring software we're running gives me plenty of time to be on the server while the issue is taking place but I can't find any way to get useful data about what's going on at the time without adding a massive constant overhead to the site (Which given it'll take months to happen isn't really viable).
I'm wondering if anyone has some advice as to how I could narrow down our search? A stack trace of the currently running threads would be spectacular, but even just a list of the pages that are actively being served would make a huge difference. I can add code to the project to make it more traceable, but logging everything in the hopes of catching it would be unrealistic (It gets a lot of traffic and we don't want to add significant overhead to page loads).