ansaurus

Question

GCC on HP-UX, lots of poll(), pipe(), and file issues

Answer 1

+3 A:

The real problems:

1st (but minor) Problem

struct pollfd pollArray[2] = {{0, POLLIN, 0}, {childOutPipe[0], POLLIN, 0}};

You are making possibly unwarranted assumptions about the order and contents of 'struct pollfd'. All the standard says is that it contains (at least) three members; it says nothing about the order in which they appear.

The header shall define the pollfd structure, which shall include at least the following members:
int    fd       The following descriptor being polled. 
short  events   The input event flags (see below). 
short  revents  The output event flags (see below).

Since you're using C99, use the secure initialization notation:

    struct pollfd pollArray[2] =
    {
        { .fd = 0,               .events = POLLIN, .revents = 0 },
        { .fd = childOutPipe[0], .events = POLLIN, .revents = 0 },
    };

You could replace the 0 for standard input with FILENO_STDIN from <fcntl.h>.

2nd (the major) Problem

    nfds_t nfds = sizeof(pollArray);

The size of the poll array is probably 16 (bytes) - on most but not all machines (32-bit and 64-bit). You need the dimension of the poll array (which is 2). This is why all hell breaks loose; the system is looking at garbage and getting confused.

Addressing a comment:

To find the dimension of an array defined in the local file or function (but not an array parameter passed into a function, nor an array defined in another file), use a variant of the macro:

#define DIM(x) (sizeof(x)/sizeof(*(x)))

This name harks back to the use of BASIC in the dim, distant past; other names I've seen are NELEMS or ARRAY_SIZE or DIMENSION (harking back to Fortran IV), and I'm sure there are lots of others.

What's happening is that because you are not setting nfds to 2, the system call is reading data after the actual struct pollfd array, and trying to make head or tail of stuff that is not a struct pollfd. In particular, it is probably writing into what you've told it is the revents field of a row in the struct pollfd array, but the actual space is the log FILE *, so that is completely screwed up. Similarly for other local variables. In other words, you've got a stack buffer overflow - aka Stack Overflow, a name that should be faintly familiar. But it is happening because you programmed it.

Fix:

    nfds_t nfds = DIM(pollArray);

3rd (medium grade) problem

   poll(pollArray, nfds, 1);
   if (errcode < 0) {

The result of poll() is not saved, and the variable errcode is never assigned a value, yet you check what the value is immediately afterwards. The corrected code would probably read:

errcode = poll(pollArray, nfds, 1);
if (errcode < 0)
{
    fprintf(stderr, "POLL returned with error %d!\n", errcode);
    eofFlag = 1;
}

Note the newline character added to the error message - you need it. Or:

if (poll(pollArray, nfds, 1) < 0)
{
    int errnum = errno;
    fprintf(stderr, "POLL returned with error (%d: %s)\n",
            errnum, strerror(errnum));
    eofFlag = 1;
}

In the second case, you'd add '#include <errno.h>' to the header list. Saving the value of errno preserves it against change by function calls - but you can only reliably test errno when a function (system call) has failed. Even successful function calls may leave errno non-zero. (For example, on some systems, if stderr is not going to a terminal, the value of errno after an I/O call is ENOTTY, even though the call as a whole succeeded.)

Previous ruminations

Some prior thoughts on what might be the problem; I think there is still some useful info down here.

~~I suspect your problem is that poll() 'damages' the set of polled descriptors, and you have to rebuild it on each loop.~~ (Having checked the manual page at the Open Group, it appears that poll() does not have the problems that select() suffers from.) This is certainly a problem with the related select() system call.

Your child code is not closing all the file descriptors when it should - you've commented out one 'close()` and there is another missing altogether. When the child has finished connecting pipes to standard input and output, you don't want the un-dupped file descriptors still open; the processes cannot detect EOF properly.

Similar comments may apply in the parent.

Also, note that the sending process might need to send multiple packets of data to the child before anything appears on the child's standard output. As an extreme case, consider 'sort'; that reads all its data before generating any output. ~~I worry about the direction switching code, therefore, though I've not fully digested what it does.~~ Of itself, the direction switching is harmless - it simply writes the new direction when it starts writing in the opposite direction from last time.

More seriously, don't use single character reads and writes; read sensible size buffers full. Sensible size might be almost any power of two between 256 and 8192; you could choose other sizes at liberty (the size of the pipe buffer might be a good size to choose). Handling multiple characters at a time will vastly improve the performance.

The way I have solved similar issues is by having two processes doing the monitoring, one for the standard input and the other for standard output - or the equivalents. This means that I don't need to use poll() (or select()) at all. The process handling the standard input reads and blocks waiting for more information; when something arrives, it logs it and writes it to the childs standard input. Similarly for the process handling standard output.

I can dig out the code that works with pipes if you need it (see my profile). I looked at it a year or two ago (hmmm; last edits in 2005 in fact, though I recompiled it in 2007) and it was still in working order (it was written circa 1989). I also have code that works on sockets instead of pipes. They'd need some adaptation to suit your requirements; they were rather specialized (and the pipe version, in particular, is aware of a client-server database protocol and attempts to handle complete packets of information).

Jonathan Leffler 2009-06-23 00:13:53

Holy cow. When I went home yesterday, I was thinking this would be one of those 'lost in the ether' questions, and then I get in this morning to this! Tremendous thanks; I'll get to work fixing those and let you know how it goes.

SparroHawc 2009-06-23 17:26:53

Alright... I've tried several different iterations of sizeof, as well as looking up different implementations - and poll() still makes me lose my logFile handle. The moment I poll(), nothing can be written to the logfile; it's like it was fclose()d prematurely. I'm thinking I may refactor this thing to use two watcher threads, as you mentioned you've done in the past. At least then I won't have to keep fighting with this thing.

SparroHawc 2009-06-23 18:25:07

A few edits later, and my listing above is completely out of date... The file handle doesn't get lost any more, thanks to your noting that I'm supposed to be using an item count instead of a byte-count sizeof, but the same problem still happens with my from-child poll returning a POLLERR. I'm wondering if the pipe widows because of the program terminating, with data still waiting to write from it.

SparroHawc 2009-06-23 20:02:11

Probably time to write a new question, with just the current problem, and a subject such as "Why does poll() return POLLERR in this case?".

Jonathan Leffler 2009-06-23 21:12:12

Good suggestion. Thanks again for all your help!

SparroHawc 2009-06-23 21:18:37

ansaurus

tags:

views:

answers:

GCC on HP-UX, lots of poll(), pipe(), and file issues

1st (but minor) Problem

2nd (the major) Problem

3rd (medium grade) problem

Previous ruminations

related questions