tags:

views:

723

answers:

8

Our server application is listening on a port, and after a period of time it no longer accepts incoming connections. (And while I'd love to solve this issue, it's not what I'm asking about here;)

The strange this is that when our app stops accepting connections on port 44044, so does IIS (on port 8080). Killing our app fixes everything - IIS starts responding again.

So the question is, can an application mess up the entire TCP/IP stack? Or perhaps, how can an application do that?

Senseless detail: Our app is written in C#, under .Net 2.0, on XP/SP2.

Clarification: IIS is not "refusing" the attempted connections. It is never seeing them. Clients are getting a "server did not respond in a timely manner" message (using the .Net TCP Client.)

+4  A: 

You haven't maxed out the available port handles have you ?
netstat -a

I saw something similar when an app was opening and closing ports (but not actually closing them correctly).

RichS
Good suggestion. We've been using netstat and TCPView (http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx) to diagnose, but nothing jumps out at us.
Gene
netstat might not be the best tool because it does not provide much information about the creation of connections.
benc
+1  A: 

Use netstat -a to see the active connections when this happens. Perhaps, your server app is not closing/disposing of 'closed' connections.

leppie
+1  A: 

Did you try this? http://blogs.msdn.com/david.wang/archive/2005/09/21/HOWTO-Diagnose-IIS6-failing-to-accept-connections-due-to-Connections-Refused.aspx

voodootikigod
Excellent link, thanks! It doesn't apply here though - IIS is not "refusing" connections, but instead it is never seeing them. There is no evidence (that I've found) from within the machine that any connection was attempted.
Gene
A: 

I guess the port number comment from RichS is correct.

Other than that, the TCP/IP stack is just a module in your operating system and, as such, can have bugs that might allow an application to kill it. It wouldn't be the first driver to be killed by a program.

(A tip to the hat towards Andrew Tanenbaum for insisting that operating systems should be modular instead of monolithic.)

xmjx
+3  A: 

You may well be starving the stack. It is pretty easy to drain in a high open/close transactions per second environment e.g. webserver serving lots of unpooled requests.

This is exhacerbated by the default TIME-WAIT delay - the amount of time that a socket has to be closed before being recycled defaults to 90s (if I remember right)

There are a bunch of registry keys that can be tweaked - suggest at least the following keys are created/edited

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

TcpTimedWaitDelay = 30
MaxUserPort = 65534 
MaxHashTableSize = 65536 
MaxFreeTcbs = 16000

Plenty of docs on MSDN & Technet about the function of these keys.

stephbu
This would have helped us avoid the issue, for a while. We would have eventually ground to a halt though, even with this fix.Thanks for the help.
Gene
A: 

I've been in a couple of similar situations myself. A good troubleshooting step is to attempt a connection from the affected machine to good known destination that isn't at that moment experiencing any connectivity issues. If the connection attempt fails, you are very likely to get more interesting details in the error message/code. For example, it could say that there aren't enough handles, or memory.

Alexander
+1  A: 

Good suggestions from everyone, thanks for your help.

So here's what was going on: It turns out that we had several services competing for the same port, and most of the time the "proper" service would get the port. Occasionally a second service would grab the port away, and the first service would try to open a different port. From that time on, the services would keep grabbing new ports every time they serviced a request (since they weren't using their preferred ports) and eventually we would exhaust all available ports.

Of course, the actual question was: "Can an application mess up the entire TCP/IP stack?", and the answer to that question is: Yes. One way to do it is to listen on a whole bunch of ports.

Gene
So you had an infinite listener problem? That should show up in netstat -a, unless the large number of listeners was causing a stack-level problem and never actually opening the listener.
benc
A: 

From a support and sys admin standpoint, I have only seen this on the rarest of occasions (more than once), but it certainly can happen.

When you are diagnosing the problem, you should carefully eliminate the possible causes, rather than blindly rebooting the system at the first sign of trouble. I only say this because many customers I work with are tempted to do that.

benc