ansaurus

Question

How to join a thread that is hanging on blocking IO?

Answer 1

+2 A:

I think, as you said, the only way would be to send a signal then catch and deal with it appropriately. Alternatives might be SIGTERM, SIGUSR1, SIGQUIT, SIGHUP, SIGINT, etc.

You could also use select() on your input descriptor so that you only read when it is ready. You could use select() with a timeout of, say, one second and then check if that thread should finish.

Chris Young 2008-10-15 05:47:47

Answer 2

A:

Can't you just close the socket you're waiting on?

dmityugov 2008-10-15 06:15:24

Answer 3

+2 A:

One solution that occurred to me the last time I had an issue like this was to create a file (eg. a pipe) that existed only for the purpose of waking up blocking threads.

The idea would be to create a file from the main loop (or 1 per thread, as timeout suggests - this would give you finer control over which threads are woken). All of the threads that are blocking on file I/O would do a select(), using the file(s) that they are trying to operate on, as well as the file created by the main loop (as a member of the read file descriptor set). This should make all of the select() calls return.

Code to handle this "event" from the main loop would need to be added to each of the threads.

If the main loop needed to wake up all of the threads it could either write to the file or close it.

I can't say for sure if this works, as a restructure meant that the need to try it vanished.

Andrew Edgecombe 2008-10-15 06:53:56

Answer 4

+4 A:

Your select() could have a timeout, even if it is infrequent, in order to exit the thread gracefully on a certain condition. I know, polling sucks... another alternative is to have a pipe for each child and add that to the list of file descriptors being watched by the thread. Send a byte to the pipe from the parent when you want that child to exit. No polling at the cost of a pipe per thread.

2008-10-15 06:54:36

or you could have one pipe for all threads, "ready" status is returned from select/poll to multiple threads waiting on a single file descriptor (as long as it is level triggered). So all threads waiting on a single "killer" pipe would receive the notification to die.

Greg Rogers 2009-06-16 21:40:21

Answer 5

+3 A:

Depends how it's waiting for IO.

If the thread is in the "Uninterruptable IO" state (shown as "D" in top), then there really is absolutely nothing you can do about it. Threads normally only enter this state briefly, doing something such as waiting for a page to be swapped in (or demand-loaded, e.g. from mmap'd file or shared library etc), however a failure (particularly of a NFS server) could cause it to stay in that state for longer.

There is genuinely no way of escaping from this "D" state. The thread will not respond to signals (you can send them, but they will be queued).

If it's a normal IO function such as read(), write() or a waiting function like select() or poll(), signals would be delivered normally.

MarkR 2008-10-15 06:57:03

Answer 6

+1 A:

I always add a "kill" function related to the thread function which I run before join that ensures the thread will be joinable within reasonable time. When a thread uses blocking IO I try to utilize the system to break the lock. For example, when using a socket I would have kill call shutdown(2) or close(2) on it which would cause the network stack to terminate it cleanly.

Linux' socket implementation is thread safe.

David Holm 2008-10-15 07:43:01

Answer 7

A:

Signals and thread is a subtle problem on Linux according to the different man pages. Do you use LinuxThreads, or NPTL (if you are on Linux) ?

I am not sure of this, but I think the signal handler affects the whole process, so either you terminate your whole process or everything continue.

You should use timed select or poll, and set a global flag to terminate your thread.

shodanex 2008-10-15 07:57:20

Answer 8

+3 A:

I too would recommend using a select or some other non-signal-based means of terminating your thread. One of the reasons we have threads is to try and get away from signal madness. That said...

Generally one uses pthread_kill() with SIGUSR1 or SIGUSR2 to send a signal to the thread. The other suggested signals--SIGTERM, SIGINT, SIGKILL--have process-wide semantics that you may not be interested in.

As for the behavior when you sent the signal, my guess is that it has to do with how you handled the signal. If you have no handler installed, the default action of that signal are applied, but in the context of the thread that received the signal. So SIGALRM, for instance, would be "handled" by your thread, but the handling would consist of terminating the process--probably not the desired behavior.

Receipt of a signal by the thread will generally break it out of a read with EINTR, unless it is truly in that uninterruptible state as mentioned in an earlier answer. But I think it's not, or your experiments with SIGALRM and SIGIO would not have terminated the process.

Is your read perhaps in some sort of a loop? If the read terminates with -1 return, then break out of that loop and exit the thread.

You can play with this very sloppy code I put together to test out my assumptions--I am a couple of timezones away from my POSIX books at the moment...

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <signal.h>

int global_gotsig = 0;

void *gotsig(int sig, siginfo_t *info, void *ucontext) 
{
        global_gotsig++;
        return NULL;
}

void *reader(void *arg)
{
        char buf[32];
        int i;
        int hdlsig = (int)arg;

        struct sigaction sa;
        sa.sa_handler = NULL;
        sa.sa_sigaction = gotsig;
        sa.sa_flags = SA_SIGINFO;
        sigemptyset(&sa.sa_mask);

        if (sigaction(hdlsig, &sa, NULL) < 0) {
                perror("sigaction");
                return (void *)-1;
        }
        i = read(fileno(stdin), buf, 32);
        if (i < 0) {
                perror("read");
        } else {
                printf("Read %d bytes\n", i);
        }
        return (void *)i;
}

main(int argc, char **argv)
{
        pthread_t tid1;
        void *ret;
        int i;
        int sig = SIGUSR1;

        if (argc == 2) sig = atoi(argv[1]);
        printf("Using sig %d\n", sig);

        if (pthread_create(&tid1, NULL, reader, (void *)sig)) {
                perror("pthread_create");
                exit(1);
        }
        sleep(5);
        printf("killing thread\n");
        pthread_kill(tid1, sig);
        i = pthread_join(tid1, &ret);
        if (i < 0)
                perror("pthread_join");
        else
                printf("thread returned %ld\n", (long)ret);
        printf("Got sig? %d\n", global_gotsig);

}

bog 2008-10-15 08:29:21

You are correct, the read() actually is in a while loop that checks for EINTR, since its in a third party library, not my own code, I totally missed that fact and that is then the reason why a simple signal isn't doing what I expected.

Grumbel 2008-10-17 18:53:49

Answer 9

A:

I think the cleanest approach would have the thread using conditional variables in a loop for continuing.

When an i/o event is fired, the conditional should be signaled.

The main thread could just signal the condition while chaning the loop predicate to false.

something like:

while (!_finished)
{
    pthread_cond_wait(&cond);
    handleio();
}
cleanup();

Remember with conditional variables to properly handle signals. They can have things such as 'spurious wakeups'. So i would wrap your own function around the cond_wait function.

Nicholas Mancuso 2008-10-15 13:58:03

Answer 10

A:

struct pollfd pfd;
pfd.fd = socket;
pfd.events = POLLIN | POLLHUP | POLLERR;
pthread_lock(&lock);
while(thread_alive)
{
    int ret = poll(&pfd, 1, 100);
    if(ret == 1)
    {
        //handle IO
    }
    else
    {
         pthread_cond_timedwait(&lock, &cond, 100);
     }
}
pthread_unlock(&lock);

thread_alive is a thread specific variable that can be used in combination with the signal to kill the thread.

as for the handle IO section you need to make sure that you used open with the O_NOBLOCK option, or if its a socket there is a similar flag you can set MSG_NOWAIT??. for other fds im not sure

luke 2008-10-15 16:07:22

Answer 11

A:

I'm surprised that nobody has suggested pthread_cancel. I recently wrote a multi-threaded I/O program and calling cancel() and the join() afterwards worked just great.

I had originally tried the pthread_kill() but ended up just terminating the entire program with the signals I tested with.

HUAGHAGUAH 2008-10-25 23:26:06

Answer 12

+2 A:

A similar problem and possible solutions are discussed there: File Descriptors And Multithreaded Programs

dmityugov 2008-11-10 12:58:48

The article gave me exactly what I was looking for--shutdown(fd, SHUT_RDWR);. Thanks.

ShaChris23 2010-06-28 19:17:51

Answer 13

A:

If you're blocking in a third-party library that loops on EINTR, you might want to consider a combination of using pthread_kill with a signal (USR1 etc) calling an empty function (not SIG_IGN) with actually closing/replacing the file descriptor in question. By using dup2 to replace the fd with /dev/null or similar, you'll cause the third-party library to get an end-of-file result when it retries the read.

Note that by dup()ing the original socket first, you can avoid needing to actually close the socket.

bdonlan 2009-06-16 20:28:33

Answer 14

+2 A:

The canonical way to do this is with pthread_cancel, where the thread has done pthread_cleanup_push/pop to provide cleanup for any resources it is using. Unfortunately this can NOT be used in c++ code, ever. Any c++ std lib code, or ANY try {} catch() on the calling stack at the time of pthread_cancel will potentially segvi killing your whole process. The only workaround is to handle SIGUSR1, setting a stop flag, pthread_kill(SIGUSR1), then anywhere the thread is blocked on i/o, if you get EINTR check the stop flag before retrying the i/o. In practice, this does not always succeed on linux, don't know why. But in any case it's useless to talk about if you have to call any 3rd party lib, because they will most likely have a tight loop that simply restarts i/o on EINTR, and reverse engineering their fd to close it won't cut it either, they could be waiting on a semaphore or other resource. In this case, it is simply impossible to write working code, period. Yes, this is utterly brain-damaged. Talk to the guys who designed c++ exceptions and pthread_cancel. Supposedly this may be fixed in some future version of c++. Good luck with that.

qqq 2010-09-26 22:08:01

ansaurus

tags:

views:

answers:

How to join a thread that is hanging on blocking IO?

related questions