tags:

views:

3732

answers:

4

One of my projects on Linux uses blocking sockets. Things happen very serially so non-blocking would just make things more complicated. Anyway, I am finding that often a recv call is returning -1 with errno set to EAGAIN.

The man page only really mentions this happening for non-blocking sockets, which makes sense. With non-blocking, the socket may or may not be available so you might need to try again...

What would cause it to happen for a blocking socket? Can I do anything to avoid it?

At the moment, my code to deal with it looks something like this (I have it throw an exception on error, but beyond that it is a very simple wrapper around recv):

int ret;
do {
 ret = ::recv(socket, buf, len, flags | MSG_NOSIGNAL);
} while(ret == -1 && errno == EAGAIN);


if(ret == -1) {
    throw socket_error(strerror(errno));
}
return ret;

Is this even correct? The EAGAIN condition gets hit pretty often.

EDIT: some things which I've noticed which may be relevant.

  1. I do set a read timeout on the socket using setsockopts, but it is set to 30 seconds. the EAGAIN's happen way more often than once every 30 secs. CORRECTION my debugging was flawed, EAGAIN's don't happen as often as I thought they did. Perhaps it is the timeout triggering.

  2. For connecting, I want to be able to have connect timeout, so I temporarily set the socket to non-blocking. That code looks like this:

    int   error = 0;
    fd_set   rset;
    fd_set   wset;
    int   n;
    const SOCKET sock = m_Socket;
    
    
    // set the socket as nonblocking IO
    const int flags = fcntl (sock, F_GETFL, 0);
    fcntl(sock, F_SETFL, flags | O_NONBLOCK);
    
    
    errno = 0;
    
    
    // we connect, but it will return soon
    n = ::connect(sock, addr, size_addr);
    
    
    if(n < 0) { 
        if (errno != EINPROGRESS) {
         return -1;
        }
    } else if (n == 0) {
        goto done;
    }
    
    
    FD_ZERO(&rset);
    FD_ZERO(&wset);
    FD_SET(sock, &rset);
    FD_SET(sock, &wset);
    
    
    struct timeval tval;
    tval.tv_sec = timeout;
    tval.tv_usec = 0;
    
    
    // We "select()" until connect() returns its result or timeout
    n = select(sock + 1, &rset, &wset, 0, timeout ? &tval : 0);
    if(n == 0) {    
        errno = ETIMEDOUT;
        return -1;
    }
    
    
    if (FD_ISSET(sock, &rset) || FD_ISSET(sock, &wset)) {
        socklen_t len = sizeof(error);
        if (getsockopt(SOL_SOCKET, SO_ERROR, &error, &len) < 0) {
         return -1;
        }
    } else {
        return -1;
    }
    
    
    done:
    // We change the socket options back to blocking IO
    if (fcntl(sock, F_SETFL, flags) == -1) {
        return -1;
    }
    return 0;
    

The idea is that I set it to non-blocking, attempt a connect and select on the socket so I can enforce a timeout. Both the set and restore fcntl calls return successfully, so the socket should end up in blocking mode again when this function completes.

A: 

Dumb question, but are you sure the socket is really blocking? Don't trust the docs, force it to blocking mode and see what happens. You'll get EAGAIN due to a socket timeout, which makes me wonder if your listener is non-blocking and thereby creating non-blocking connection sockets you don't expect.

easel
I have controller over both ends, and never set either as non-blocking.
Evan Teran
A socket with a timeout is not a blocking socket...
easel
not be argumentative, but sure it is...it does block, just not forever.
Evan Teran
@Erik - Evan is correct. A blocking read with a timeout behaves the same as a blocking read without one. The only difference is when the socket is unblocked.
Tom
Feel free to be pedantic about the terms, but his original problem was that he wasn't handling the timeout. If he'd been thinking of the socket as non-blocking, he would have known he had to handle the timeout exception. In some languages, non-blocking sockets == blocking sockets with timeouts.
easel
A: 

Is it possible that you're using MSG_DONTWAIT is being specified as part of your flags? The man page says EAGAIN will occur if no data is available and this flag is specified.

If you really want to force a block until the recv is somewhat successful, you may wish to use the MSG_WAITALL flag.

Rick C. Petty
I just grepped my source tree, MSG_DONTWAIT is not used.
Evan Teran
A: 

I don't suggest this as a first-attempt fix, but if you're all out of options, you can always select() on the socket with a reasonably long timeout to force it to wait for data.

rmeador
+5  A: 

It's possible that you have a nonzero receive timeout set on the socket (via setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO,...)) as that would also cause recv to return EAGAIN

Hasturkun
yes, but it is set to 30000 milliseconds, I get the EAGAIN's *way* more often than that. Pretty much constatnly.
Evan Teran
*CORRECTION* my debugging was flawed, EAGAIN's don't happen as often as I thought they did. Perhaps it is the timeout triggering.
Evan Teran