tags:

views:

515

answers:

2

Hi,

My user-space application sometimes blocks after receiving an EINTR-Signal, somehow.

What I recorded with strace:

time(NULL)                              = 1257343042
time(NULL)                              = 1257343042
rt_sigreturn(0xbff07be4)                = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
time(NULL)                              = 1257343042
futex(0xb7cea80c, 0x80 /* FUTEX_??? */, 2) = ? ERESTARTSYS (To be restarted)
--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
sigreturn()                             = ? (mask now [ALRM])
futex(0xb7cea80c, 0x80 /* FUTEX_??? */, 2) = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
futex(0xb7cea80c, 0x80 /* FUTEX_??? */, 2) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
time(NULL)                              = 1257343443
time(NULL)                              = 1257343443
futex(0xb7cea80c, 0x80 /* FUTEX_??? */, 2) = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
futex(0xb7cea80c, 0x80 /* FUTEX_??? */, 2

Can I catch the EINTR signal and how can I repeat concerned calls such as write, read or select? How can I determine WHERE this EINTR occurred, even if I used third-party libraries working with system calls?

Why my app is completely blocked after receiving an EINTR (see strace dump: I sent a SIGUSR1 which normally should be handled)? And why is futex() returning ERESTARTSYS to user space?

thanks

+5  A: 

The code which calls write (or other blocking operations) has to be aware of EINTR. If a signal occurs during a blocking operation, then the operation will either (a) return partial completion, or (b) return failure, do nothing, and set errno to EINTR.

So, for an all-or-fail write operation which retries after interruptions, you'd do something like this:

while(size > 0) {
    int written = write(filedes, buf, size);
    if (written == -1) {
        if (errno == EINTR) continue;
        return -1;
    }
    buf += written;
    size -= written;
}
return 0; // success

Or for something a bit better behaved, which retries EINTR, writes as much as it can, and reports how much is written on failure (so the caller can decide whether and how to continue partial writes which fail for a reason other than interruption by signal):

int total = 0;
while(size > 0) {
    int written = write(filedes, buf, size);
    if (written == -1) {
        if (errno == EINTR) continue;
        return (total == 0) ? -1 : total;
    }
    buf += written;
    total += written;
    size -= written;
}
return total; // bytes written

GNU has a non-standard TEMP_FAILURE_RETRY macro that might be of interest, although I can never find the docs for it when I want them. Including now.

Steve Jessop
Thanks, I found the Docu to the macro at http://www.gnu.org/s/libc/manual/html_node/Interrupted-Primitives.htmlDo you know something about the locked futex() function?
Maus
Don't know about the ERESTARTSYS. I think it's an internal implementation detail - you see it in the trace, but shouldn't ever see it returned to user code, because the user-mode system code should either be retrying the call that returned it, or else converting it to EINTR. But I might be wrong on that, I last shaved on Saturday and therefore don't have anything like full linux geek credentials ;-)
Steve Jessop
futex is a Linux-specific "light-wait" locking framework for building higher-level things like semaphore and mutex. See futex(7).
Nikolai N Fetissov
Assuming Linux, check out the 'man 7 signal' section entitled "Interruption of Signal Calls and Library Functions by Signal Handlers". It is a good reference.
David Joyner
A: 

See also the discussion of "loser mode" in Worse is Better

Tim Schaeffer