I have a C# application that has been running fine for several years. It connects via a TCP/IP socket to a machine that sends me stock trade executions.
Recently, I've tried to deploy it to some machines in a new data center that is behind a hardware firewall, and I've started to see some weird dis-connects.
When a dis-connect happens, in my app (the client side), I see nothing unusual except that I stop receiving data over the socket. Wireshark confirms that no data is reaching the socket and my application's receive thread is blocking on the Receive() call when I stop it in the debugger. The socket shows as ESTABLISHED in netstat.
But from the server side, it looks like my client is dis-connecting. Looking at their logs, it looks like the socket on their end usually ends up with either (nRecvd=-1,errno=104) or (nRecvd=0,errno=11). (104 is connection reset by peer).
The dis-connect only seems to happen after a period of in-activity. I have solved this for now by implementing a heartbeat between my client and their server that just sends a short message every 20 seconds and gets a reply. This has caused the dis-connects to drop to 0 over the past few days.
At first, I figured that the hardware firewall was the problem. It was causing the socket to time out after in-activity. But the person in charge of the firewall claims that the timeout for connects on this port (8887) is 2160 minutes.
I am running Windows Server 2003 and .NET 3.5. The trades server is a linux machine (sles9 I believe though I'm not sure).
Any ideas on what might be going on? What could I do to debug this more given that I don't have any access to the firewall logs and no ability to change the code on the trade server?
Thanks, Mike