tags:

views:

53

answers:

1

When my code is in a blocking recv call, if the other side reboots, then this side recv call doesn't get to know about it and just goes into a hung state.

How to avoid this?

+7  A: 

By default, if the other side of the connection disappears without terminating the connection properly, the OS on your side has no way of knowing that no further data will be coming. That's why recv() will block forever in this situation.

If you want to have a timeout, then set the socket to non-blocking and use select() to wait for it to become readable. select() allows you to specify a timeout.

Alternatively, you can set the SO_KEEPALIVE socket option with setsockopt(). This will enable the sending of TCP "keepalives", that will allow your side to detect a stale connection. (Do note that with the default settings, it can take a long time to detect that the connection has gone).

caf
Thanks for the answer. What about send? Will it also face a similar problem?
Jay
@Jay: `send()` will unblock after (in worst case) 4.5 minutes with error since other side will not send ACK and the ACKs have timeout. TCP is data-centric protocol: as long as you do not see error no data were lost. But it can't tell anything about the state of the other side. In you case it is no different from the idle TCP connection. That's why your `recv()` blocks essentially forever.
Dummy00001
Jay
@Jay: Only for `recv()`. `send()` returns `EWOULDBLOCK` only once you fill up your send buffer. `send()` is different, because if the other machine has come back after rebooting, then the new data you `send()` will solicit a connection reset.
caf
@Caf, I wrote a test code, in which in a loop I kept doing send in non-blocking mode and did a hard reboot of the peer system, but here send started giving EWOULDBLOCK. So, according to you, this must have happened as my continous send in a loop must have filled up the send buffer and caused this EWOULDBLOCK. right?
Jay
@Jay: Correct. And if you wait long enough, `send()` will eventually give `ECONNRESET` instead.
caf
@caf, Thanks. Now, when I get a EWOULDBLOCK, then if I add that socket as a writefd into a select call with 0 timeout, then when the otherside comes up, will it get to know? Will the select comeout giving an error? I tried testing this, but behaviour appears to be random, sometimes I find it being blocked and sometimes I find it coming out.
Jay
@Jay: You should add the file descriptor to the `exceptfds` - if it's in that set after `select()`, then a `read()` on it should return an error indicating that the connection has failed.
caf
@caf, Thanks for all your replies. They all were very helpful. : )
Jay