tags:

views:

1546

answers:

5
+9  Q: 

exec and fork()

What are the differences between fork() and exec()?

+4  A: 

fork() splits the current process into two processes. Or in other words, your nice linear easy to think of program suddenly becomes two separate programs running one piece of code:

 int pid = fork();

 if (pid == 0)
 {
     printf("I'm the child");
 }
 else
 {
     printf("I'm the parent, my child is %i", pid);
     // here we can kill the child, but that's not very parently of us
 }

This can kind of blow your mind. Now you have one piece of code with pretty much identical state being executed by two processes. The child process inherits all the code and memory of the process that just created it, including starting from where the "fork()" call just left off. The only difference is the fork() code return to tell you if you are the parent or the child. If you are the parent, the return is the id of the child.

exec is a bit easier to grasp, you just tell exec to execute a process using the target executable and you don't have two processes running the same code or inheriting the same state. Like @Steve Hawkins says, exec can be used after you fork to execute in the current process the target executable.

Doug T.
there's also the condition when `pid < 0` and the `fork()` call failed
Jonathan Fingland
That doesn't blow my mind at all :-) One piece of code being executed by two processes happens every time a shared library or DLL is being used.
paxdiablo
+2  A: 

The same as the difference between apples and oranges.

fork() creates a copy of the current process, with execution in the new child starting from just after the fork() call. After the fork(), they're identical, except for the return value of the fork() function. (RTFM for more details.) The two processes can then diverge still further, with one unable to interfere with the other, except possibly through any shared file handles.

exec() replaces the current process with a new one. It has nothing to do with fork(), except that an exec() often follows fork() when what's wanted is to launch a different child process, rather than replace the current one.

Warren Young
+4  A: 

They are use together to create a new child process. First, calling fork creates a copy of the current process (the child process). Then, exec is called from within the child process to "replace" the copy of the parent process with the new process.

The process goes something like this:

child = fork();  //Fork returns a PID for the parent process, or 0 for the child, or -1 for Fail

if (child < 0) {
    std::cout << "Failed to fork GUI process...Exiting" << std::endl;
    exit (-1);
} else if (child == 0) {       // This is the Child Process
    // Call one of the "exec" functions to create the child process
    execvp (argv[0], const_cast<char**>(argv));
} else {                       // This is the Parent Process
    //Continue executing parent process
}
Steve Hawkins
+20  A: 

The use of fork and exec exemplifies the spirit of UNIX in that it provides a very simple way to start new processes.

The fork call basically makes a duplicate of the current process, identical in almost every way (not everything is copied over, for example, resource limits in some implementations but the idea is to create as close a copy as possible).

The new process (child) gets a different process ID (PID) and has the the PID of the old process (parent) as its parent PID (PPID). Because the two processes are now running exactly the same code, they can tell which is which by the return code of fork - the child gets 0, the parent gets the PID of the child. This is all, of course, assuming the fork call works - if not, no child is created and the parent gets an error code.

The exec call is a way to basically replace the entire current process with a new program. It loads the program into the current process space and runs it from the entry point.

So, fork and exec are often used in sequence to get a new program running as a child of a current process. Shells typically do this whenever you try to run a program like find - the shell forks, then the child loads the find program into memory, setting up all command line arguments, standard I/O and so forth.

But they're not required to be used together. It's perfectly acceptable for a program to fork itself without execing if, for example, the program contains both parent and child code (you need to be careful what you do, each implementation may have restrictions). This was used quite a lot (and still is) for daemons which simply listen on a TCP port and fork a copy of themselves to process a specific request while the parent goes back to listening.

Similarly, programs that know they're finished and just want to run another program don't need to fork, exec and then wait for the child. They can just load the child directly into their process space.

Some UNIX implementations have an optimized fork which uses what they call copy-on-write. This is a trick to delay the copying of the process space in fork until the program attempts to change something in that space. This is useful for those programs using only fork and not exec in that they don't have to copy an entire process space.

If the exec is called following fork (and this is what happens mostly), that causes a write to the process space and it is then copied for the child process.

Note that there is a whole family of exec calls (execl, execle, execve and so on) but exec in context here means any of them.

The following diagram illustrates the typical fork/exec operation where the bash shell is used to list a directory with the ls command:

+--------+
| pid=7  |
| ppid=4 |
| bash   |
+--------+
    |
    | calls fork
    V
+--------+             +--------+
| pid=7  |    forks    | pid=22 |
| ppid=4 | ----------> | ppid=7 |
| bash   |             | bash   |
+--------+             +--------+
    |                      |
    | waits for pid 22     | calls exec to run ls
    |                      V
    |                  +--------+
    |                  | pid=22 |
    |                  | ppid=7 |
    |                  | ls     |
    V                  +--------+
+--------+                 |
| pid=7  |                 | exits
| ppid=4 | <---------------+
| bash   |
+--------+
    |
    | continues
    V
paxdiablo
A: 

I think some concepts from "Advanced Unix Programming" by Marc Rochkind were helpful in understanding the different roles of fork()/exec(), especially for someone used to the Windows CreateProcess() model:

A program is a collection of instructions and data that is kept in a regular file on disk. (from 1.1.2 Programs, Processes, and Threads)

.

In order to run a program, the kernel is first asked to create a new process, which is an environment in which a program executes. (also from 1.1.2 Programs, Processes, and Threads)

.

It’s impossible to understand the exec or fork system calls without fully understanding the distinction between a process and a program. If these terms are new to you, you may want to go back and review Section 1.1.2. If you’re ready to proceed now, we’ll summarize the distinction in one sentence: A process is an execution environment that consists of instruction, user-data, and system-data segments, as well as lots of other resources acquired at runtime, whereas a program is a file containing instructions and data that are used to initialize the instruction and user-data segments of a process. (from 5.3 exec System Calls)

Once you understand the distinction between a program and a process, the behavior of fork() and exec() function can be summarized as:

  • fork() creates a duplicate of the current process
  • exec() replaces the program in the current process with another program

(this is essentially a simplified 'for dummies' version of paxdiablo's much more detailed answer)

Michael Burr