tags:

views:

451

answers:

1

I'm having a lot of trouble building a 'middleman' logger - the intention is to place it on the path above an item in /usr/bin and capture everything going to and from the application. (Black box 3rd-party app is failing FTP for some reason.) Once run, the middleman will fork, redirect stdout and stdin to/from pipes that the parent has control of, and then execute the program in /usr/bin. (Hardcoded; yes, I know, I'm bad.)

However, once I run poll(), things get weird. I lose the handle to my logfile, the poll on the output pipe from the child throws an error, cats and dogs start living together, et cetera.

Can anyone shed some light on this?

Here's what I currently have... The poll() in question is marked with non-indented comments for ease of location.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <poll.h>
#include <time.h>
#include <sys/types.h>
#include <fcntl.h>

#define MAX_STR_LEN 1024
static int directionFlag; /* 0 = input, 1 = output */
static int eofFlag;

/* Splits the next char from the stream inFile, with extra
information logged if directionFlag swaps */
void logChar(int inFilDes, int outFilDes, FILE *logFile, int direction)
{
    char inChar = 0;
    if(read(inFilDes, &inChar, sizeof(char)) > 0)
    {

     if(direction != directionFlag)
     {
      directionFlag = direction;
      if(direction)
      {
       fprintf(logFile, "\nOUTPUT: ");
      } else {
       fprintf(logFile, "\nINPUT: ");
      }
     }

     write(outFilDes, &inChar, sizeof(char));
     fputc(inChar, stderr);
     fputc(inChar, logFile);
    } else {
     eofFlag = 1;
    }
    return;
}

int main(int argc, char* argv[])
{
    pid_t pid;

    int childInPipe[2];
    int childOutPipe[2];

    eofFlag = 0;

    /* [0] is input, [1] is output*/

    if(pipe(childInPipe) < 0 || pipe(childOutPipe) < 0) {
     fprintf(stderr,"Pipe error; aborting\n");
      exit(1);
    }

    if((pid = fork()) == -1){
     fprintf(stderr,"Fork error; aborting\n");
     exit(1);
    }

    if(pid)
    {
     /*Parent process*/

     int i;
     int errcode;
     time_t rawtime;
     struct tm * timeinfo;
     time(&rawtime);
     timeinfo=localtime(&rawtime);

     struct pollfd pollArray[2] = {
      { .fd = 0, .events = POLLIN, .revents = 0 },
      { .fd = childOutPipe[0], .events = POLLIN, .revents = 0 }
     };
     /* Yet again, 0 = input, 1 = output */

     nfds_t nfds = sizeof(struct pollfd[2]);

     close(childInPipe[0]);
     close(childOutPipe[1]);

     /* We don't want to change around the streams for this one,
     as we will be logging everything - and I do mean everything */

     FILE *logFile;
     if(!(logFile = fopen("/opt/middleman/logfile.txt", "a"))) {
      fprintf(stderr, "fopen fail on /opt/middleman/logfile.txt\n");
      exit(1);
     }

     fprintf(logFile, "Commandline: ");

     for(i=0; i < argc; i++)
     {
      fprintf(logFile, "%s ", argv[i]);
     }
     fprintf(logFile, "\nTIMESTAMP: %s\n", asctime(timeinfo));

     while(!eofFlag)
     {

// RIGHT HERE is where things go to pot
      errcode = poll(pollArray, nfds, 1);
// All following fprintf(logfile)s do nothing
      if(errcode < 0) {
       fprintf(stderr, "POLL returned with error %d!", errcode);
       eofFlag = 1;
      }
      if((pollArray[0].revents && POLLERR) & errno != EAGAIN ) {
       fprintf(stderr, "POLL on input has thrown an exception!\n");
       fprintf(stderr, "ERRNO value: %d\n", errno);
       fprintf(logFile, "POLL on input has thrown an exception!\n");
       eofFlag = 1;
      } else if(pollArray[0].revents && POLLIN) {
       logChar(pollArray[0].fd, childInPipe[1], logFile, 0);
      } else if((pollArray[1].revents && POLLERR) & errno != EAGAIN ) {
       fprintf(stderr, "POLL on output has thrown an exception!\n");
       fprintf(stderr, "ERRNO value: %d\n", errno);
       fprintf(logFile, "POLL on output has thrown an exception!\n");
       eofFlag = 1;
      } else if(pollArray[1].revents && POLLIN) {
       logChar(pollArray[1].fd, 1, logFile, 1);
      }

     }

     fclose(logFile);

    }
    else
    {
     /*Child process; switch streams and execute application*/
     int i;
     int catcherr = 0;
     char stmt[MAX_STR_LEN] = "/usr/bin/";

     close(childInPipe[1]);
     close(childOutPipe[0]);

     strcat(stmt, argv[0]);

     if(dup2(childInPipe[0],0) < 0) {
      fprintf(stderr, "dup2 threw error %d on childInPipe[0] to stdin!\n", errno);
     }
//   close(childInPipe[0]);

     if(dup2(childOutPipe[1],1) < 0)
     {
      fprintf(stderr, "dup2 threw error %d on childInPipe[1] to stdout!\n", errno);
     }

     /* Arguments need to be in a different format for execv */
     char* args[argc+1];
     for(i = 0; i < argc; i++)
     {
      args[i] = argv[i];
     }
     args[i] = (char *)0;

     fprintf(stderr, "Child setup complete, executing %s\n", stmt);
     fprintf(stdout, "Child setup complete, executing %s\n", stmt);

     if(execv(stmt, args) == -1) {
      fprintf(stderr, "execvP error!\n");
      exit(1);
     }
    }
    return 0;
}


EDIT 6/23/09 12:20PM

After the fixes, I have attempted to run 'banner' through this program, and here's the output I get...

Child setup complete, executing /usr/bin/banner
POLL on output has thrown an exception!
ERRNO value: 0

The logfile has the following:

Commandline: banner testing 
TIMESTAMP: Tue Jun 23 11:21:00 2009

The reason ERRNO has a 0 in it is because poll() returns just fine; it's the pollArray[1].revents that came back with an error, which means childOutPipe[0] polled as having an error. logChar(), as far as I can tell, never gets called.

I'm going to try splitting out poll() into two different calls.


Okay, the moment I poll() - even on stdin, which doesn't return with an error message - it kills my ability to write to the logFile. Also, I discovered that the while() loop runs several times before the output poll comes back with an error on the pipe. I'm becoming increasingly convinced that poll() is simply a lost cause.
Every attempt to write to logFile fails after the poll(), even a successful poll(), with errno set to "Bad file number". This really should not be happening. I honestly cannot see how it would be affecting my file handle.
Okay, so apparently I'm a moron. Thanks for setting me straight; I was assuming nfds was a byte size, not an array size. That's fixed, and voila! It's not killing my logFile handle any more.

+3  A: 

The real problems:

1st (but minor) Problem

struct pollfd pollArray[2] = {{0, POLLIN, 0}, {childOutPipe[0], POLLIN, 0}};

You are making possibly unwarranted assumptions about the order and contents of 'struct pollfd'. All the standard says is that it contains (at least) three members; it says nothing about the order in which they appear.

The header shall define the pollfd structure, which shall include at least the following members:

int    fd       The following descriptor being polled. 
short  events   The input event flags (see below). 
short  revents  The output event flags (see below).

Since you're using C99, use the secure initialization notation:

    struct pollfd pollArray[2] =
    {
        { .fd = 0,               .events = POLLIN, .revents = 0 },
        { .fd = childOutPipe[0], .events = POLLIN, .revents = 0 },
    };

You could replace the 0 for standard input with FILENO_STDIN from <fcntl.h>.

2nd (the major) Problem

    nfds_t nfds = sizeof(pollArray);

The size of the poll array is probably 16 (bytes) - on most but not all machines (32-bit and 64-bit). You need the dimension of the poll array (which is 2). This is why all hell breaks loose; the system is looking at garbage and getting confused.

Addressing a comment:

To find the dimension of an array defined in the local file or function (but not an array parameter passed into a function, nor an array defined in another file), use a variant of the macro:

#define DIM(x) (sizeof(x)/sizeof(*(x)))

This name harks back to the use of BASIC in the dim, distant past; other names I've seen are NELEMS or ARRAY_SIZE or DIMENSION (harking back to Fortran IV), and I'm sure there are lots of others.

What's happening is that because you are not setting nfds to 2, the system call is reading data after the actual struct pollfd array, and trying to make head or tail of stuff that is not a struct pollfd. In particular, it is probably writing into what you've told it is the revents field of a row in the struct pollfd array, but the actual space is the log FILE *, so that is completely screwed up. Similarly for other local variables. In other words, you've got a stack buffer overflow - aka Stack Overflow, a name that should be faintly familiar. But it is happening because you programmed it.

Fix:

    nfds_t nfds = DIM(pollArray);

3rd (medium grade) problem

   poll(pollArray, nfds, 1);
   if (errcode < 0) {

The result of poll() is not saved, and the variable errcode is never assigned a value, yet you check what the value is immediately afterwards. The corrected code would probably read:

errcode = poll(pollArray, nfds, 1);
if (errcode < 0)
{
    fprintf(stderr, "POLL returned with error %d!\n", errcode);
    eofFlag = 1;
}

Note the newline character added to the error message - you need it. Or:

if (poll(pollArray, nfds, 1) < 0)
{
    int errnum = errno;
    fprintf(stderr, "POLL returned with error (%d: %s)\n",
            errnum, strerror(errnum));
    eofFlag = 1;
}

In the second case, you'd add '#include <errno.h>' to the header list. Saving the value of errno preserves it against change by function calls - but you can only reliably test errno when a function (system call) has failed. Even successful function calls may leave errno non-zero. (For example, on some systems, if stderr is not going to a terminal, the value of errno after an I/O call is ENOTTY, even though the call as a whole succeeded.)


Previous ruminations

Some prior thoughts on what might be the problem; I think there is still some useful info down here.

I suspect your problem is that poll() 'damages' the set of polled descriptors, and you have to rebuild it on each loop. (Having checked the manual page at the Open Group, it appears that poll() does not have the problems that select() suffers from.) This is certainly a problem with the related select() system call.

Your child code is not closing all the file descriptors when it should - you've commented out one 'close()` and there is another missing altogether. When the child has finished connecting pipes to standard input and output, you don't want the un-dupped file descriptors still open; the processes cannot detect EOF properly.

Similar comments may apply in the parent.

Also, note that the sending process might need to send multiple packets of data to the child before anything appears on the child's standard output. As an extreme case, consider 'sort'; that reads all its data before generating any output. I worry about the direction switching code, therefore, though I've not fully digested what it does. Of itself, the direction switching is harmless - it simply writes the new direction when it starts writing in the opposite direction from last time.

More seriously, don't use single character reads and writes; read sensible size buffers full. Sensible size might be almost any power of two between 256 and 8192; you could choose other sizes at liberty (the size of the pipe buffer might be a good size to choose). Handling multiple characters at a time will vastly improve the performance.


The way I have solved similar issues is by having two processes doing the monitoring, one for the standard input and the other for standard output - or the equivalents. This means that I don't need to use poll() (or select()) at all. The process handling the standard input reads and blocks waiting for more information; when something arrives, it logs it and writes it to the childs standard input. Similarly for the process handling standard output.

I can dig out the code that works with pipes if you need it (see my profile). I looked at it a year or two ago (hmmm; last edits in 2005 in fact, though I recompiled it in 2007) and it was still in working order (it was written circa 1989). I also have code that works on sockets instead of pipes. They'd need some adaptation to suit your requirements; they were rather specialized (and the pipe version, in particular, is aware of a client-server database protocol and attempts to handle complete packets of information).

Jonathan Leffler
Holy cow. When I went home yesterday, I was thinking this would be one of those 'lost in the ether' questions, and then I get in this morning to this! Tremendous thanks; I'll get to work fixing those and let you know how it goes.
SparroHawc
Alright... I've tried several different iterations of sizeof, as well as looking up different implementations - and poll() still makes me lose my logFile handle. The moment I poll(), nothing can be written to the logfile; it's like it was fclose()d prematurely. I'm thinking I may refactor this thing to use two watcher threads, as you mentioned you've done in the past. At least then I won't have to keep fighting with this thing.
SparroHawc
A few edits later, and my listing above is completely out of date... The file handle doesn't get lost any more, thanks to your noting that I'm supposed to be using an item count instead of a byte-count sizeof, but the same problem still happens with my from-child poll returning a POLLERR. I'm wondering if the pipe widows because of the program terminating, with data still waiting to write from it.
SparroHawc
Probably time to write a new question, with just the current problem, and a subject such as "Why does poll() return POLLERR in this case?".
Jonathan Leffler
Good suggestion. Thanks again for all your help!
SparroHawc