tags:

views:

667

answers:

2

Hi all,

I've got a C program I'm writing. Here's what it does:

  • Create n fifos using mkfifo
  • Open them for read (with the O_NONBLOCK flag set)
  • Open them for write
  • Spawn a thread

In the thread, run in a loop:

  • Create an fd_set of the file descriptors for all n fifos
  • Call select(n, &my_set, NULL, NULL, NULL)
  • For each fd ready for I/O (FD_ISSET(fd, &my_set)):
    • Read a string from the fd (read(fd, buf, buf_len))
    • Print the string
    • If string == "kill", mark the fd as dead and remove it from the list (n--)
    • If n == 0, terminate the thread

In the main program:

  • For i = 0 to n
    • Write to fds[i] with a string (write(fds[i], buf, buf_len))
  • For i = 0 to n
    • Write to fds[i] with the string "kill"
  • Join on the thread I created
  • Exit

The behavior I'm seeing is that select() will return once with a length of 1, being the first fd in the list. The second time through the loop, select will just sit there forever.

Here's my output:

thread created
Waiting on 4 file descriptors
> Wrote 'Hello to target 0 from writer 0' to 0
> Wrote 'Hello to target 0 from writer 1' to 1
> Wrote 'Hello to target 1 from writer 0' to 2
> Wrote 'Hello to target 1 from writer 1' to 3
> Sending kill to 0:0 (#0)
> Sending kill to 0:1 (#1)
> Sending kill to 1:0 (#2)
> Sending kill to 1:1 (#3)
< Got string: 'Hello to target 0 from writer 0'
Waiting on 4 file descriptors
^C

The OS is Linux, in case it matters.

Link to the code: https://dl.getdropbox.com/u/188590/fifotest.c (Sorry it's a bit heinous)

Thanks, Nathan

+2  A: 

The first parameter in the call to select() should be the highest-numbered file descriptor plus 1, not the number of file descriptors in the fd_set.

Here's what I changed to fix this issue:

--- fifotest-1.c        2009-05-22 23:44:03.000000000 -0400
+++ fifotest.c  2009-05-22 23:34:00.000000000 -0400
@@ -34,19 +34,22 @@
     sim_arg_t* ifs = arg;
     uint32_t num_ifs;
     uint32_t select_len;
+    int maxfd;

        num_ifs = ifs->num_ifs;
     while (num_ifs > 0) {
                FD_ZERO (&set);
                select_len = 0;
-               for (i = 0; i < ifs->num_ifs; ++i) {
+               for (maxfd=0, i = 0; i < ifs->num_ifs; ++i) {
                        if (ifs->if_list[i].valid) {
                                FD_SET(ifs->if_list[i].fh, &set);
-                               ++select_len;
+                               if (ifs->if_list[i].fh > maxfd)
+                                   maxfd = ifs->if_list[i].fh;
+                               select_len++;
                        }
                }
                printf("Waiting on %d file descriptors\n", select_len);
-               ret = select(select_len, &set, NULL, NULL, NULL);
+               ret = select(maxfd+1, &set, NULL, NULL, NULL);
                if (ret < 0) {
                        fprintf(stderr, "Select returned error!\n");
                        continue;
Lance Richardson
That's the first problem...there are then some accounting issues to deal with...
Jonathan Leffler
OK, so that explains one out of my two problems (why I was only getting I/O from a single fifo out of four). Thanks for that!Now all four fifos output the first message, but they all hang waiting for the kill.
Nathan
Hmm, after fixing up this issue and actually running the code, it appears to work - I see "Sending kill to" and "Got string" for 0:0, 0:1, 1:0, and 1:1, and the program exits after printing "Done with listener". Perhaps, as Jonathan mentioned, the execution is non-deterministic and I'm just getting lucky?
Lance Richardson
That's probably it. Thanks for your help on this problem. I appreciate it :)
Nathan
OK, it worked perfectly the first 5 times, then I started seeing hangs - non-deterministic as Jonathan suggested. (There's a moral about testing here somewhere... :-)
Lance Richardson
+4  A: 

As Lance Richardson said, the first problem is that you need to pass the number of the maximum file descriptor plus one, not the number of file descriptors.

You then have to clean up the housekeeping in the listener thread - I got most of the data, but ended up listening to 6 file descriptors, not 4. (The reported number was now the largest fd, not the number of file descriptors.)

You also have a problem that you write a string plus a null byte to each pipe, then a second string plus a null byte. Since the scheduling is non-deterministic, the main program actually gets to write both its strings to each fifo, so when the listener thread gets to read it, it reads both strings. When I printed out lengths, I got a total length of 41 read (read_len), but the length of the string per strlen() was 31. In other words, the first read included the 'kill' part, but you didn't notice in the printout because of the trailing null at the end of the first message. Hence you were waiting for that which will never happen.

Jonathan Leffler
Can you elaborate on "clean up the housekeeping"? How did you wind up listening on 6 fds?Your third paragraph explains my other problem. I didn't realize that fifos worked like stream sockets. Thanks for the help!
Nathan
My mistake - edited out.
Jonathan Leffler
Roger.Thanks for the help. This long weekend will be a lot nicer now that I understand what's going on here :)
Nathan
I got to see various execution patterns, sometimes seeing both messages read in one call to read(), sometimes seeing two calls for a given file descriptor. So, on my machine (an Ubuntu 8.04 VMWare image running under Windows XP SP2; Intel Core Duo chip), the execution was variable. But with a loop to search through the string, I got the kills, and the thread exited, and so did main(). You were very close - good use of SO.
Jonathan Leffler