views:

316

answers:

4

I know this is probably the canonical "It depends..." question but I'd appreciate any pointers as to where to start looking.

I have a client/server app talking over ethernet. In one computer I run the server and a client and on another just the client. One runs Vista and one runs XP. After an uptime of about 3 weeks the entire computer freezes and nothing works, not mouse, not keyboard, nothing -just power off. Every ten seconds the server sends a ping message to see if the clients are alive, other than that just a few small messages go back and forth every day.

I'm trying to find out if it's me causing it or something else. I've started a session and after a few days I thought I'd check for strange increases in memory use but beyond that I have very few ideas.

+1  A: 

This isn't going to be the answer, but I'd advise starting by checking your OS event logs and running a perfmon to keep track of memory, cpu usage etc.

tarn
CPU usage is very low, below 10% during use and below 3% during idle. OS logs is a good idea. Are there other and better perfmons than the task manager in XP?
Niklas Winde
Go to the command line or Start -> Run and enter "perfmon".
Niki
Also check Start -> Run EVENTVWR to see if there are any logs from the time of the freeze.
Sohnee
+2  A: 

You could attach a kernel debugger to the OS. That way you should be able to inspect the state of the OS and your process even if the OS is completely unresponsive. (Unfortunately, it's a lot harder than just hitting "break" in VS. I suggest reading John Robbin's "Debugging Applications for .NET and Windows" before trying that.)

You could also try to create memory dumps of your application in regular intervals. You might have to do a little scripting for that, though. (usually, you'd create a dump with a keystroke, using a tool like userdump or adplus, but if the OS is not responding to keystrokes, that won't work.) That way, you know what state your process is in during or shortly before a hang. This page: http://blogs.msdn.com/debuggingtoolbox/default.aspx is a good starting point for scripting WinDbg. (If you don't know what to do with a memory dump, I'd again suggest John Robbin's excellent book on debugging!)

Other than that, I can only think of standard debugging tricks: does the problem occur on every PC? Does it happen if there are no client requests? Does it happen sooner if there are more client requests? Does it happen sooner if there is less available physical memory? Try removing parts of your application (maybe on a separate server for testing) and see if the problem still occurs, and so on. Try running it in a VM so you can see if it uses the CPU, harddisk, or network during those "hangs".

Niki
+1  A: 

Some thoughts to consider:

  1. You know the computer doesn't respond, but that doesn't mean it's hung. Does it respond to a ping?
  2. Maybe the disk activity light is on all the time?
  3. You say "no keyboard" - do you mean no caps lock or num lock lights?
  4. Although the .NET application may be the only one you're running at the time, that does not imply it is the cause of the problem. Some background job could be doing it.

For example, I notice that Retrospect backup, when it is creating a snapshot, freezes the entire system for 10-15 minutes. I mean, no caps lock, the clock in the task bar doesn't update, no CTRL-ALT-DEL, can't type into an "Answer" text box in SO, nothing. It had nothing to do with what I was doing at the time, which was answering a question on SO.

After it came back, SO asked if I was a human. My feelings were hurt. ;-)

John Saunders
A: 

Which computer freezes, the server or client? And what OSes are they running respectively?

As Daniel L noted, tight polling loops can really kill the CPU. If you can, change your code to use event handlers, it's a much more robust solution.

Finally, are you certain there's not a hardware problem on the freezing computer?

Ian Kemp