So does the CPU spike happen
immediately preceding the timer
callback, within the timer callback,
or immediately following the timer
callback?
You misunderstand. This is not a spike. If it were, there would be no problem; I can deal with spikes. But it's not... the CPU usage is going up generally. Even when the service is doing nothing, waiting for the next timer hit. When the service starts up, things are nice and calm, and the graph looks like what you'd expect... generally, 0% usage, with spikes to 10% as NHibernate hits the database or the service does some trivial amount of work. But this increases to an across-the-board 25% (more if I let it go too far) usage at all times while the process is running.
That made Ian's suggestion the logical silver bullet (NHibernate does a lot of stuff when you're not looking). Alas, I've implemented his solution, but it hasn't had an effect (I have no proof of this, but I actually think it's made things worse... average usage is seeming to go up much faster now). Note that stripping out the NHibernate "sections" (as you recommend) is not feasible, since that would strip out about 90% of the code in the service, which would let me rule out the timer as a problem (which I absolutely intend to try), but can't help me rule out NHibernate as the issue, because if NHibernate is causing this, then the dodgy fix that's implemented (see below) is just going to have to become The Way The System Works; we are so dependent on NHibernate for this project that the PM simply won't accept that it's causing an unresolvable structural problem.
I just noted a sense of desperation in
the question -- that your problems
would continue barring a small miracle
Don't mean for it to come off that way. At the moment, the services are being restarted daily (with an option to input any number of hours of the day for them to shutdown and restart), which patches the problem but cannot be a long-term solution once they go onto the production machine and start to become busy. The problems will not continue, whether I fix them or the PM maintains this constraint on them. Obviously, I would prefer to implement a real fix, but since the initial testing revealed no reason for this, and the services have already been extensively reviewed, the PM would rather just have them restart multiple times than spend any more time trying to fix them. That's entirely out of my control and makes the miracle you were talking about more important than it would otherwise be.
That is extremely intriguing (insofar
as you trust your profiler).
I don't. But then, these are Windows services written in .NET 1.1 running on a Windows 2000 machine, deployed by a dodgy Nant script, using an old version of NHibernate for database access. There's little on that machine I would actually say I trust.