views:

337

answers:

4

Another socket problem.

In my client code, I am sending some packet and expectign some response from the server side:


send()

recv() <-- it is blocking

Immediately after send(), the server crashes and rebooted itself. In the meantime the recv() is waiting. But even after the server is up, the receive call is hanging. I have added SIGPIPE signal handling but its still not able to recognize that the socket is broken.

When i cancel the operation, i got the error from recv() that interrupt has been issued.

Anyone could help me how to rectify this error?

This is in a shared library running on Solaris machine.

+6  A: 

May be you should set a timeout delay in order to manage this case. It can easily done by using setsockopt and setting SO_RECVTIMEO flag on your socket:

  struct timeval tv;
  tv.tv_sec = 30;
  tv.tv_usec = 0;
  if (setsockopt(socket_fd, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv,  sizeof tv))
  {
    perror("setsockopt");
    return -1;
  }

Another possibility is to use non blocking sockets and manage read/write stuff with poll(2) or select(2). You should take a look on Beej's Guide to Network Programming.

Patrick MARIE
But anyway take care about SO_RCVTIMEO as it seems it is not universally supported on all unix systems. In this case, setsockopt would fail, and you will have to use select or poll.
Patrick MARIE
Another one to try is SO_KEEPALIVE, if this is TCP of course :)
Nikolai N Fetissov
I know about these options just thinking if there is any other way. It means there is no other way except SO_CVTIMEO.
Adil
SO_RCVTIMEO is not supported in Solaris :( Any other way?
Adil
Well, non blocking sockets and select(2) to handle the timeout. Check this page which will explain you how to do this: http://rhoden.id.au/doc/sockets2.html
Patrick MARIE
+3  A: 

The problem is that the connection is never actually closed. (No FIN packages are sent etc, the other end just goes away.)

What you want to do is set a timeout for recv'ing on the socket, using setsockopt(3) with SO_RCVTIMEO as option_name.

Hans W
SO_RCVTIMEO is not supported in Solaris :( Any other way?
Adil
+2  A: 

As others have mentioned, you can use select() to set a time limit for the socket to become readable.

By default, the socket will become readable when there's one or more bytes available in the socket receive buffer. I say "by default" because this amount is tunable by setting the socket receive buffer "low water mark" using the SO_RCVLOWAT socket option.

Below is a function you can use to determine if the socket is ready to be read within a specified time limit. It will return 1 if the socket has data available for reading. Otherwise, it will return 0 if it times out.

The code is based on an example from the book Unix Network Programming (www.unpbook.com) that can provide you with more information.

/* Wait for "timeout" seconds for the socket to become readable */
readable_timeout(int sock, int timeout)
{
    struct timeval tv;
    fd_set         rset;
    int            isready;

    FD_ZERO(&rset);
    FD_SET(sock, &rset);

    tv.tv_sec  = timeout;
    tv.tv_usec = 0;

 again:
    isready = select(sock+1, &rset, NULL, NULL, &tv);
    if (isready < 0) {
        if (errno == EINTR) goto again;
        perror("select"); _exit(1);
    }

    return isready;
}

Use it like this:

if (readable_timeout(sock, 5/*timeout*/)) {
    recv(sock, ...)

You mention handling SIGPIPE on the client side which is separate issue. If you are getting this is means your client is writing to the socket, even after having received a RST from the server. That is a separate issue from having a problem with a blocking call to recv().

The way that could arise is that the server crashes and reboots, losing its TCP state. Your client sends data to the server which sends back a RST, since it no longer has state for the connection. Your client ignores the RST and tries to send more data and it's this second send() which causes your program to receive the SIGPIPE signal.

What error were you getting from the call to recv()?

Todd Hayton
I am not getting any error and recv() just blocks. I will try if select() is feasible for my application. Thanks.
Adil
I tested and found that select() is the only option and it works fine under my test scenario.
Adil
A: 

Another way to make the recv() call nono-blockign on Solaris is to use fcntl() to set the socket descriptor non-blocking:

fcntl(sockDesc, F_SETFL, O_NONBLOCK);

This can be used in along with select() to protect your recv() from faulty select() return value (in case if select() returns positive and there is no data on the socket).

Adil