tags:

views:

1781

answers:

5

I've created a question about this a few days. My solution is something in the lines of what was suggested in the accepted answer. However, a friend of mine came up with the following solution:

Please note that the code has been updated a few times (check the edit revisions) to reflect the suggestions in the answers below. If you intend to give a new answer, please do so with this new code in mind and not the old one which had lots of problems.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[]){
    int fd[2], i, aux, std0, std1;

    do {
     std0 = dup(0); // backup stdin
     std1 = dup(1); // backup stdout

     // let's pretend I'm reading commands here in a shell prompt
     READ_COMMAND_FROM_PROMPT();

     for(i=1; i<argc; i++) {
      // do we have a previous command?
      if(i > 1) {
       dup2(aux, 0);
       close(aux);
      }

      // do we have a next command?
      if(i < argc-1) {
       pipe(fd);

       aux = fd[0];
       dup2(fd[1], 1);
       close(fd[1]);
      }

      // last command? restore stdout...
      if(i == argc-1) {
       dup2(std1, 1);
       close(std1);
      }

      if(!fork()) {
       // if not last command, close all pipe ends
       // (the child doesn't use them)
       if(i < argc-1) {
        close(std0);
        close(std1);
        close(fd[0]);
       }

       execlp(argv[i], argv[i], NULL);
       exit(0);
      }
     }

     // restore stdin to be able to keep using the shell
     dup2(std0, 0);
     close(std0);
    }

    return 0;
}

This simulates a series of commands through pipes like in bash, for instance: cmd1 | cmd2 | ... | cmd_n. I say "simulate", because, as you can see, the commands are actually read from the arguments. Just to spare time coding a simple shell prompt...

Of course there are some issues to fix and to add like error handling but that's not the point here. I think I kinda get the code but it still makes me a lot of confusing how this whole thing works.

Am I missing something or this really works and it's a nice and clean solution to solve the problem? If not, can anyone point me the crucial problems this code has?

+4  A: 
ephemient
That's quite a diagram!
Jonathan Leffler
Indeed... Although, I still think it's confusing lol but I think I get it how it all works now. And yes we are aware that it's missing some closes(), maybe not as much as the ones you guys are pointing out. We also know about the lost stdin, I found that the hard way with segmentation fault when returning to the shell. I gotta be honest though ephemient, I couldn't understand your solution. I tried and tried to get my head around it but can't understand, so I tried it my own way using a few tips from your code. I'll post my full code in a new answer below...
Nazgulled
I've updated the code in the first answer a few times now and I think I got it in this last one. Could you please take a look and see if I'm missing anything important?
Nazgulled
I understand now what you are saying about not doing those changes in the parent. Like I said, this was my friends approach and although I didn't knew why, I wasn't very comfortable with it. I forgot to post my own code, going to do it in a few seconds in an answer below...
Nazgulled
I was looking (again) at the "diagram" for you own solution and I'm not sure, but I think you have an error. On the second command iteration, in the parent, you check for a next command, if there is one, set the old_fds = new_fds. Shouldn't the old_fds be equal to {5, 6} instead of {3, 4} as you have there?
Nazgulled
I think I got it working now with your solution and everything makes sense. Specially after reading a few times that "diagram". :) Still, waiting for your answer on the previous, I believe it was just a typo from you and I want to make sure.
Nazgulled
It's likely that the diagram has some copy'n'paste errors, but the specific FD numbers don't matter.
ephemient
Actually, in this specific case, I understand it much better without the splitting... I'm going to post my final code (with your solution) below...
Nazgulled
+3  A: 

The key problem is that you create a bunch of pipes and don't make sure that all the ends are closed properly. If you create a pipe, you get two file descriptors; if you fork, then you have four file descriptors. If you dup() or dup2() one end of the pipe to a standard descriptor, you need to close both ends of the pipe - at least one of the closes must be after the dup() or dup2() operation.


Consider the file descriptors available to the first command (assuming there are at least two - something that should be handled in general (no pipe() or I/O redirection needed with just one command), but I recognize that the error handling is eliminated to keep the code suitable for SO):

    std=dup(1);    // Likely: std = 3
    pipe(fd);      // Likely: fd[0] = 4, fd[1] = 5
    aux = fd[0];
    dup2(fd[1], 1);
    close(fd[1]);  // Closes 5

    if (fork() == 0) {
         // Need to close: fd[0] aka aux = 4
         // Need to close: std = 3
         close(fd[0]);
         close(std);
         execlp(argv[i], argv[i], NULL);
         exit(1);
    }

Note that because fd[0] is not closed in the child, the child will never get EOF on its standard input; this is usually problematic. The non-closure of std is less critical.


Revisiting amended code (as of 2009-06-03T20:52-07:00)...

Assume that process starts with file descriptors 0, 1, 2 (standard input, output, error) open only. Also assume we have exactly 3 commands to process. As before, this code writes out the loop with annotations.

std0 = dup(0); // backup stdin - 3
std1 = dup(1); // backup stdout - 4

// Iteration 1 (i == 1)
// We have another command
pipe(fd);   // fd[0] = 5; fd[1] = 6
aux = fd[0]; // aux = 5
dup2(fd[1], 1);
close(fd[1]);       // 6 closed
// Not last command
if (fork() == 0) {
    // Not last command
    close(std1);    // 4 closed
    close(fd[0]);   // 5 closed
    // Minor problemette: 3 still open
    execlp(argv[i], argv[i], NULL);
    }
// Parent has open 3, 4, 5 - no problem

// Iteration 2 (i == 2)
// There was a previous command
dup2(aux, 0);      // stdin now on read end of pipe
close(aux);        // 5 closed
// We have another command
pipe(fd);          // fd[0] = 5; fd[1] = 6
aux = fd[0];
dup2(fd[1], 1);
close(fd[1]);      // 6 closed
// Not last command
if (fork() == 0) {
    // Not last command
    close(std1);   // 4 closed
    close(fd[0]);  // 5 closed
    // As before, 3 is still open - not a major problem
    execlp(argv[i], argv[i], NULL);
    }
// Parent has open 3, 4, 5 - no problem

// Iteration 3 (i == 3)
// We have a previous command
dup2(aux, 0);      // stdin is now read end of pipe 
close(aux);        // 5 closed
// No more commands

// Last command - restore stdout...
dup2(std1, 1);     // stdin is back where it started
close(std1);       // 4 closed

if (fork() == 0) {
    // Last command
    // 3 still open
    execlp(argv[i], argv[i], NULL);
}
// Parent has closed 4 when it should not have done so!!!
// End of loop
// restore stdin to be able to keep using the shell
dup2(std0, 0);
// 3 still open - as desired

So, all the children have the original standard input connected as file descriptor 3. This is not ideal, though it is not dreadfully traumatic; I'm hard pressed to find a circumstance where this would matter.

Closing file descriptor 4 in the parent is a mistake - the next iteration of 'read a command and process it won't work because std1 is not initialized inside the loop.

Generally, this is close to correct - but not quite correct.

Jonathan Leffler
Like I said, we were aware of some missing closes, but not those on the child though... One thing though, closing fd[0] is the same as closing aux in the child?
Nazgulled
With all your suggestions, I tried to fix the "pseudo" code in the first post, can you please take a look at the new code and see if I missed something?
Nazgulled
Just in case you were looking at the edited code I posted a few minutes ago, I've updated it again, I think it's better now.
Nazgulled
Because aux is a copy of fd[0], it doesn't matter which of the two you pass to close() -- the same file descriptor will be closed.
Jonathan Leffler
Looking at my last revision, I think I'm missing a close(std0) after the last dup2() to restore the stdin. And since I'm closing std0 and std1, I think I need to make the stdin/stdout backups with dup() inside while loop before READ_COMMAND_FROM_PROMPT(), am I right on all this? Other than, am I missing something?
Nazgulled
Once again, I've updated the code to reflect the changes I just pointed out in my previous comment. I think I got everything right now, but I'm not the expert here, so I'm waiting for your comments...
Nazgulled
Ships passing in the night. There were indeed issues with not closing std0 in the child process. And there was an issue with std1 being closed in the outer (current pipeline) loop and not reinitialized. So I'd say that you were about correct now, yes.
Jonathan Leffler
I see the problem not closing std0 in the childreen but I don't see the problem in closing both std0 and std1 in the parent (you seem to have removed those close calls from your code above). I tested my code with closing std0/std1 in the parent and the shell works fine...
Nazgulled
In fact, it works both ways. Doing this "dup2(std1, 1);close(std1);" when it's the last command and this "dup2(std0, 0);close(std0);" after all commands have been executed works in the same way as doing just this "dup2(std1, 1);" and "dup2(std0, 0);" (in the appropriate places of course). It's the "same" as in "it works", I return to the shell prompt and both the standard input/output works, if I type a command without piping any other, it works just fine both ways. So, should I close them or not, is it indifferent or not?
Nazgulled
@Nazgulled: If you notice the timestamp quoted, and then look at the revision of your code at the time in question, I think you'll see I accurately analyzed what was then current. I have not amended my analysis to cover the revision(s) you made while I was doing my analysis.
Jonathan Leffler
I thought you were talking about the latest one, I shouldn't have edited that many times. Anyway, I just did another edit a few seconds ago and just to make it clear: 1) Is everything ok now? Am I closing everything that I'm supposed to? 2) What about the question in my previous comment, I does the code work with and without closing std1 and std0 in the parent? I don't get it... 3) Am I missing anything?
Nazgulled
@Nazgulled: I've looked at the latest code - I think it is OK (though the 'do' loop doesn't have a 'while' at the end - it is pseudo-code still). I have not done a formal paper-execution of the code as I did previously. You should do so to satisfy yourself that it is correct.
Jonathan Leffler
Ok, thanks. You've been very helpful just as ephemient, if only I could mark both of your answers as accepted :/ Not sure which one to mark though...
Nazgulled
+1  A: 

It will give results, some that are not expected. It is far from a nice solution: It messes with the parent process' standard descriptors, does not recover the standard input, descriptors leak to children, etc.

If you think recursively, it may be easier to understand. Below is a correct solution, without error checking. Consider a linked-list type command, with it's next pointer and a argv array.

void run_pipeline(command *cmd, int input) {
  int pfds[2] = { -1, -1 };

  if (cmd->next != NULL) {
    pipe(pfds);
  }
  if (fork() == 0) { /* child */
    if (input != -1) {
      dup2(input, STDIN_FILENO);
      close(input);
    }
    if (pfds[1] != -1) {
      dup2(pfds[1], STDOUT_FILENO);
      close(pfds[1]);
    }
    if (pfds[0] != -1) {
      close(pfds[0]);
    }
    execvp(cmd->argv[0], cmd->argv);
    exit(1);
  }
  else { /* parent */
    if (input != -1) {
      close(input);
    }
    if (pfds[1] != -1) {
      close(pfds[1]);
    }
    if (cmd->next != NULL) {
      run_pipeline(cmd->next, pfds[0]);
    }
  }
}

Call it with the first command in the linked-list, and input = -1. It does the rest.

Juliano
Actually, I think this is even more confusing than the code above which I already understand. Thanks for your input though.
Nazgulled
One thing though, is that so bad messing with the parents descriptors if I restore them like I'm doing in the updated code? Please look at the new code if you haven't done so.
Nazgulled
You should look at my code again, it is not that much complicated. ; Yes, it is still bad. The whole point of fork+exec is that you can "prepare the house" (descriptors, permissions, capabilities, environment variables, etc) between the `fork` and the `exec` calls. Doing everything in the parent overcomplicates something intended to be simple.
Juliano
I'm not convinced that explaining it recursively is any better than explaining it iteratively, but... +1 for hammering in those points, or at least trying to.
ephemient
A: 

Both in this question and in another (as linked in the first post), ephemient suggested me a solution to the problem without messing with the parents file descriptors as demonstrated by a possible solution in this question.

I didn't get his solution, I tried and tried to understand but I can't seem to get it. I also tried to code it without understanding but it didn't work. Probably because I've failed to understand it correctly and wasn't able to code it the it should have been coded.

Anyway, I tried to come up with my own solution using some of the things I understood from the pseudo code and came up with this:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <wait.h>
#include <string.h>
#include <readline/readline.h>
#include <readline/history.h>

#define NUMPIPES 5
#define NUMARGS 10

int main(int argc, char *argv[]) {
    char *bBuffer, *sPtr, *aPtr = NULL, *pipeComms[NUMPIPES], *cmdArgs[NUMARGS];
    int aPipe[2], bPipe[2], pCount, aCount, i, status;
    pid_t pid;

    using_history();

    while(1) {
     bBuffer = readline("\e[1;31mShell \e[1;32m# \e[0m");

     if(!strcasecmp(bBuffer, "exit")) {
      return 0;
     }

     if(strlen(bBuffer) > 0) {
      add_history(bBuffer);
     }

     sPtr = bBuffer;
     pCount =0;

     do {
      aPtr = strsep(&sPtr, "|");

      if(aPtr != NULL) {
       if(strlen(aPtr) > 0) {
        pipeComms[pCount++] = aPtr;
       }
      }
     } while(aPtr);

     cmdArgs[pCount] = NULL;

     for(i = 0; i < pCount; i++) {
      aCount = 0;

      do {
       aPtr = strsep(&pipeComms[i], " ");

       if(aPtr != NULL) {
        if(strlen(aPtr) > 0) {
         cmdArgs[aCount++] = aPtr;
        }
       }
      } while(aPtr);

      cmdArgs[aCount] = NULL;

      // Do we have a next command?
      if(i < pCount-1) {
       // Is this the first, third, fifth, etc... command?
       if(i%2 == 0) {
        pipe(aPipe);
       }

       // Is this the second, fourth, sixth, etc... command?
       if(i%2 == 1) {
        pipe(bPipe);
       }
      }

      pid = fork();

      if(pid == 0) {
       // Is this the first, third, fifth, etc... command?
       if(i%2 == 0) {
        // Do we have a previous command?
        if(i > 0) {
         close(bPipe[1]);
         dup2(bPipe[0], STDIN_FILENO);
         close(bPipe[0]);
        }

        // Do we have a next command?
        if(i < pCount-1) {
         close(aPipe[0]);
         dup2(aPipe[1], STDOUT_FILENO);
         close(aPipe[1]);
        }
       }

       // Is this the second, fourth, sixth, etc... command?
       if(i%2 == 1) {
        // Do we have a previous command?
        if(i > 0) {
         close(aPipe[1]);
         dup2(aPipe[0], STDIN_FILENO);
         close(aPipe[0]);
        }

        // Do we have a next command?
        if(i < pCount-1) {
         close(bPipe[0]);
         dup2(bPipe[1], STDOUT_FILENO);
         close(bPipe[1]);
        }
       }

       execvp(cmdArgs[0], cmdArgs);
       exit(1);
      } else {
       // Do we have a previous command?
       if(i > 0) {
        // Is this the first, third, fifth, etc... command?
        if(i%2 == 0) {
         close(bPipe[0]);
         close(bPipe[1]);
        }

        // Is this the second, fourth, sixth, etc... command?
        if(i%2 == 1) {
         close(aPipe[0]);
         close(aPipe[1]);
        }
       }

       // wait for the last command? all others will run in the background
       if(i == pCount-1) {
        waitpid(pid, &status, 0);
       }

       // I know they will be left as zombies in the table
       // Not relevant for this...
      }
     }
    }

    return 0;
}

This may not be the best and cleanest solution but it was something I could come up with and, most importantly, something I can understand. What good is to have something working that I don't understand and then I'm evaluated by my teacher and I can't explain to him what the code is doing?

Anyway, what do you think about this one?

Nazgulled
A: 

This is my "final" code with ephemient suggestions:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <wait.h>
#include <string.h>
#include <readline/readline.h>
#include <readline/history.h>

#define NUMPIPES 5
#define NUMARGS 10

int main(int argc, char *argv[]) {
    char *bBuffer, *sPtr, *aPtr = NULL, *pipeComms[NUMPIPES], *cmdArgs[NUMARGS];
    int newPipe[2], oldPipe[2], pCount, aCount, i, status;
    pid_t pid;

    using_history();

    while(1) {
     bBuffer = readline("\e[1;31mShell \e[1;32m# \e[0m");

     if(!strcasecmp(bBuffer, "exit")) {
      return 0;
     }

     if(strlen(bBuffer) > 0) {
      add_history(bBuffer);
     }

     sPtr = bBuffer;
     pCount = -1;

     do {
      aPtr = strsep(&sPtr, "|");

      if(aPtr != NULL) {
       if(strlen(aPtr) > 0) {
        pipeComms[++pCount] = aPtr;
       }
      }
     } while(aPtr);

     cmdArgs[++pCount] = NULL;

     for(i = 0; i < pCount; i++) {
      aCount = -1;

      do {
       aPtr = strsep(&pipeComms[i], " ");

       if(aPtr != NULL) {
        if(strlen(aPtr) > 0) {
         cmdArgs[++aCount] = aPtr;
        }
       }
      } while(aPtr);

      cmdArgs[++aCount] = NULL;

      // do we have a next command?
      if(i < pCount-1) {
       pipe(newPipe);
      }

      pid = fork();

      if(pid == 0) {
       // do we have a previous command?
       if(i > 0) {
        close(oldPipe[1]);
        dup2(oldPipe[0], 0);
        close(oldPipe[0]);
       }

       // do we have a next command?
       if(i < pCount-1) {
        close(newPipe[0]);
        dup2(newPipe[1], 1);
        close(newPipe[1]);
       }

       // execute command...
       execvp(cmdArgs[0], cmdArgs);
       exit(1);
      } else {
       // do we have a previous command?
       if(i > 0) {
        close(oldPipe[0]);
        close(oldPipe[1]);
       }

       // do we have a next command?
       if(i < pCount-1) {
        oldPipe[0] = newPipe[0];
        oldPipe[1] = newPipe[1];
       }

       // wait for last command process?
       if(i == pCount-1) {
        waitpid(pid, &status, 0);
       }
      }
     }
    }

    return 0;
}

Is it ok now?

Nazgulled
I haven't run it to be sure, but it looks fine. Little nits: it wouldn't hurt to take the condition out before `oldPipe = newPipe`; `if (i == pcount - 1)` could just be an `else` of the previous `if`, but I would move it out of the loop altogether. These don't change how your program runs, though; they're more stylistic concerns.
ephemient
You're right about the second point, it makes more sense and the 'if' is unnecessary. But the whole wait code needs tweaking anyway cause I never wait for all he previous commands and they will become zombies this way. But really not important for this exercise. The first point though, I only followed your pseudo code on the answer above. But if I removed the condition, wouldn't I remain with 2 file descriptors opened in the last command?
Nazgulled
It's just an assignment of numbers. The actual file descriptors in newPipe[] aren't in use when you're at the very last command. This really is just personal taste... it's unnecessary, but I like to eliminate conditionals.
ephemient
Ok, thanks for the tip then :)
Nazgulled