tags:

views:

119

answers:

4

Hi,

I'm having an issue with the Socket.SendAsync method not detecting a dead TCP connection. In my client/server app, the server is sending heartbeats to connected clients in regular intervals.

The issue that I'm experiencing is that even though the client might be dead, the callbacks from the SendAsync method indicate "SocketError.Success" and the Socket.Connected property is true, even though the client is no longer "alive". So, to the server it looks like the heartbeat data was sent properly and the client is still alive.

I'm seeing this issue every time, the client side PC is either put to sleep/hibernate or e.g. when the client is running in a VMWare instance and that instance becomes suspended. I do not see this issue when the client shuts down the application, kills it from the taskmanager, etc.

    internal void InternalSendAsync(ByteDataChunk chunk)
    {
        asyncSendArgs.SetBuffer(chunk.Buffer, 0, chunk.Offset);
        asyncSendArgs.UserToken = chunk;
        Socket.SendAsync(asyncSendArgs);
    }

    private void SendCompleted(object sender, SocketAsyncEventArgs args)
    {
        if (args.SocketError != SocketError.Success || !Socket.Connected)
        {
            InternalDisconnect(args.SocketError);
            return;
        }

        // all is good & do some other stuff
    }

Anybody has any idea what's going on here and why the SendCompleted method does not return a SocketError even though the client is long dead (I've had the server run for multiple hours before and the dead socket was never detected)?

Thanks,

Tom

+2  A: 

From MSDN:

Note that the successful completion of the SendAsync method does not indicate that the data was successfully delivered.

IMO, one of the most difficult parts about networking is you can't be sure that the client ever got the data. If you are implementing a heartbeat system, you should have the client echo back the heartbeat, proving that it is still alive.

When you suspend a process or hibernate the computer, I don't think that the socket will be closed like it will if you shutdown the machine you are running on.

James
Right - this is the usual approach. If the client hasn't responded to a heartbeat in <timeout> seconds, assume that it's dead and disconnect it.
caf
I agree this would be one way to do this, however I've never seen this issue with synchronous sends as I've always received a send error after while even if the host was put to sleep/hibernate
Tom Frey
A: 

Are the heartbeats actually sent? My suspicion would be the Naggle algorithm. Pull out wireshark and check what flows on the wire. You can disable Nagle with SocketOptionName.NoDelay. From MSDN:

A successful completion of the BeginSend method means that the underlying system has had room to buffer your data for a network send. If it is important to your application to send every byte to the remote host immediately, you can use SetSocketOption to enable SocketOptionName.NoDelay. For more information about buffering for network efficiency, refer to the Nagle algorithm in MSDN.
Nikolai N Fetissov
Nagle shouldn't have anything to with this as even with Nagle turned on, the data will be sent after a fixed time interval (usually 200 - 500ms)
Tom Frey
I wonder if .net async layer does its own buffering.
Nikolai N Fetissov
A: 

Ignore the Socket.Connected property; it's pretty much useless. In your sample code, you assume that everything's OK if either Socket.Connected is true or there wasn't an error code. The first thing I'd do is remove the Socket.Connected portion.

I recommend keeping an outstanding asynchronous read going at all times along with periodic sends of heartbeats. If the socket is no longer connected, then either the read or write will result in an error.

The send must timeout a number of times, with exponential backoff. So, it takes a while to detect when the other side disappears (in the case of the program exiting, the OS will immediately respond that the connection is no longer viable). It shouldn't be anywhere near hours, though; a few minutes at the most (assuming a slow network connection to begin with). My sockets regularly detect dropped connections within a second or so.

Stephen Cleary
that is exactly what I'm doing and the question was centered around why the write does not result in an error if the client machine was put to sleep. I can write to that socket for hours via the SendAsync method and it never throws an error if the client was suspended but it does throw an error, if e.g. the client was killed
Tom Frey
Your code is assuming that if `Socket.Connected` is true, then the connection is still valid. That is wrong. Remove the `Socket.Connected` portion of the check (leaving only the error portion of the check), and see if that works.
Stephen Cleary
you're right, I made an error when posting the code in here as my production code is doing some other stuff and I mistyped this when I shortened it and posted here. Production code is || !Socket.Connected
Tom Frey
A: 

Have you used Wireshark or similar, to see what is happening on the network? One would think that if the TCP subsystem on the client is not acknowledging the packets, then there should be a socket error. Maybe the client is keeping the port open and acknowledging the packet(s). If so, then you might want to try to solve that on the client, or do what Nikolai said.

Stan Kirk
In the capture I see a PSH, ACK, followed by 3 retransmissions, nothing after that. Imho, I should receive a timeout exception on the socket because no ACK is received but I don't?
Tom Frey