views:

250

answers:

8

I hate asking questions like this - they're so undefined... and undefinable, but here goes.

Background: I've got a DLL that is the guts of an application that is a timed process. My timer receives a configuration for the interval at which it runs and a delegate that should be run when the interval elapses. I've got another DLL that contains the process that I inject.

I created two applications, one Windows Service and one Console Application. Each of the applications read their own configuration file and load the same libraries pushing the configured timer interval and delegate into my timed process class.

Problem: Yesterday and for the last n weeks, everything was working fine in our production environment using the Windows Service. Today, the Windows Service will run for a period of around 20-30 minutes and hangs (with a timer interval of 30 secods), but the console application runs without issue and has for the past 4 hours. Detailed logging doesn't indicate any failure. It's as if the Windows Service just...dies quietly - without stopping.

Given that my Windows Service and Console Applications are doing the exact same thing, I can only think that there is something that is causing the Windows Service process to hang - but I have no idea what could be causing that. I've checked the configuration files, and they're both identical - I even copied and pasted the contents of one into the other just to be sure. No dice.

Can anyone make suggestions as to what might cause a Windows Service to hang, when a counterpart Console Application using the same base libraries doesn't; or can anyone point me in the direction of tools that would allow me to diagnose what could be causing this issue?

Thanks for everyone's help - still digging.

+1  A: 

I would probably put in some file logging just to see how far the program is getting. It may give you a better idea of what is looping/hanging/deadlocked/crashing.

Joe Philllips
Detailed logging doesn't indicate any failure.
BenAlabaster
+1  A: 

You can try these techniques

  • Logging start logging the flow of the code in the service. Have this parameter based so you dont have a deluge after you are done. You should log all function names, parameters, timestamps.

  • Attach Debugger Locally or Remotely attach a debugger with the code to the running service, set appropriate breakpoints (can be based on the data gathered from logging)

  • PerfMon Run this utility and gather information about the machine that the service is running on for any additional clues (high CPU spikes, IO spikes, excessive paging, etc)

Raj More
Already have detailed logging and have already trawled through the logs. Am running the Console application to review realtime logging output to see if that highlights the issue. So far though, in 4 hours, the problem hasn't occurred. The code-bases are identical so if it fails in one, it should fail in the other. Will check Perfmon... not holding my breath it will be useful though.
BenAlabaster
A: 

Microsoft provides a good resource on debugging a Windows Service. That essentially sounds like what you'd have to do given that your question is so generic. With that said, has any changes been made to the system over the last few days that could aversely affect the service? Have you made any updates to the code that change the way the service might possibly work?

Again, I think you're going to have to do some serious debugging to find your problem.

JasCav
The body of the application is in a separate DLL for the exact reason that debugging Windows Services are a pain in the ass. I wrote the code in such a way that there is only absolutely necessary code in the Windows Service itself - the code needed to load the class contained in the DLL - ProcessService
BenAlabaster
P.S. No changes have been made to the code-base that would cause the Win Svc to fail but not the Console App. If one fails, logically both should. Neither the console shell nor the Windows Service shell have been modified in any way between it working and not working.
BenAlabaster
A: 

What type of timer are you using in the windows service? I've seen numberous people on SO have problems with timers and windows services. Here is a good tutorial just to make sure you are setting it up correctly and using the right type of timer. Hope that helps.

SwDevMan81
Thanks, I checked that out - I am using the Timer correctly.
BenAlabaster
+8  A: 

You need to figure out what changed on the production server. At first, the IT guys responsible will swear that nothing changed but you have to be persistent. i've seen this happen to often i've lost count. Software doesn't spoil. Period. The change must have been to the environment.

Difference in execution: You have two apps running the same code. The most likely difference (and culprit) is that the service is running with a different set of security credentials than your console app and might fall victim to security vagaries. Check on that first. Which Windows account is running the service? What is its role and scope? Is there any 3rd party security software running on the server and perhaps Killing errant apps? Do you have to register your service with a 3rd party security service? Is your .Net assembly properly signed? Are your .Net assemblies properly registered and configured on the server? Last but not least, don't forget that a debugger user, which you most likely are, gets away with a lot more stuff than many other account types.

Another thought: Since timing seems to be part of the issues, check the scheduled tasks on the machine. Perhaps there's a process that is set to go off every 30 minutes that is interfering with your own.

Paul Sasik
In answer to: Security - if security had changed, I would expect the service to fail quickly. It's running for a period before failure; The 3rd party security software though may be a possibility. I will check that out. I will also make sure it's properly signed, although, I think it is... it's been a while since the shell was written though so I'll have to double check.
BenAlabaster
+1 That is a good point. We've had a similar problem where the IT guys swear nothing changed, but come to find out they rolled out a bunch of new patches, one of which caused a bunch of problems.
SwDevMan81
Patches are often the culprit but i've seen a number of situations where seemingly unrelated applications have caused mine to fail.
Paul Sasik
Wondering if there is a Virus Guard on the server that's causing the problem.
BenAlabaster
+2  A: 

You can debug a Windows service by running it interactively within Visual Studio. This may help you to isolate the problem by setting (perhaps conditional) breakpoints.

Alternatively, you can use the Visual Studio "Attach to process" dialog window to find the service process and attach to it with the "Debug CLR" option enabled. Again this allows you to set breakpoints as needed.

Are you using any assertions? If an assertion fires without being re-directed to write to a log file, your service will hang. If the code throws an unhandled exception, perhaps because of a memory leak, then your service process will crash. If you set the Service Control Manager (SCM) to restart your process in the event of a crash, you should be able to see that the service has been restarted. As you have identical code running in both environments, these two situations don't seem likely. But remember that your service is being hosted by the SCM, which means a very different environment to the one in which your console app is running.

I often use a "heartbeat", where each active thread in the service sends a regular (say every 30 seconds) message to a local MSMQ. This enables manual or automated monitoring, and should give you some clues when these heartbeat messages stop arriving.

Annother possibility is some sort of permissions problem, because the service is probably running with a different local/domain user to the console.

After the hang, can you use the SCM to stop the service? If you can't, then there is probably some sort of thread deadlock problem. After the service appears to hang, you can go to a command-line and type sc queryex servicename. This should give you the current STATE of the service.

RoadWarrior
Wasn't aware the "sc" command could query the service, that's very useful. Thanks.
BenAlabaster
A: 

Another potential problem in reference to psasik's answer is if your application is relying on something only available when being run in User Mode.

Running in service mode runs in (is it desktop0?) which can cause some issues in my experience if you are trying to determine states of something that can only be seen in user mode.

Aequitarum Custos
A: 

Smells like a threading issue to me. Is there any threading or async work being done at all? One crucial question is "does the service hang on the same line of code or same method every time?" Use your logging to find out the last thing that happens before a hang, and if so, post the problem code.

One other tool you may consider is a good profiler. If it is .NET code, I believe RedGate ANTS can monitor it and give you a good picture of any threadlock scenarios.

Daniel
I will check out RedGate ANTS, that may help shed some light on things.
BenAlabaster