tags:

views:

236

answers:

4

I understand that execve() and family require the first argument of its argument array to be the same as the executable that is also pointed to by its first argument. That is, in this:

execve(prog, args, env);

args[0] will usually be the same as prog. But I can't seem to find information as to why this is.

I also understand that executables (er, at least shell scripts) always have their calling path as the first argument when running, but I would think that the shell would do the work to put it there, and execve() would just call the executable using the path given in its first argument ("prog" from above), then passing the argument array ("args" from above) as one would on the command line.... i.e., I don't call scripts on the command line with a duplicate executable path in the args list....

/bin/ls /bin/ls /home/john

Can someone explain?

+1  A: 

According to this, the first argument being the program name is a custom.

by custom, the first element should be the name of the executed program (for example, the last component of path)

That said, these values could be different. If for example, the program was launched from a symbolic link. The program name might be different than that of the link used to launch it.

And, you are right. The shell would normally do the work of setting up the first argument. In this case however, the use of execve circumvents the shell altogether - which is why you need to set it up yourself.

Tim Kane
Thanks for the answer
EBM
+1  A: 

It allows you to specify the exact path to the executable to be loaded, but also allows for a "beautified" name to be presented in tools such as ps or top.

exec*('/bin/ls', 'ls', '/home/john');
Ignacio Vazquez-Abrams
You need double quotes; and probably a null char * to terminate the list of arguments, assuming you're using execl().
Jonathan Leffler
I'll take your word for it, but interestingly, I created a test that executes a simple bash script that spits out $0, and even when I use a "prettified" program name, $0 will still be the full path (constructed from execve()'s first argument, then, I assume).
EBM
@Johnathan: Blame Python.
Ignacio Vazquez-Abrams
Some programs also use arg[0] to figure out what to do: For example, busybox does this. I believe bash also changes its behavior, depending on what you invoke it as. (`bash` or just `sh`)
Thanatos
+1  A: 

There is no requirement that the first of the arguments bear any relation to the name of the executable:

int main(void)
{
    char *args[3] = { "rip van winkle", "30", 0 };
    execv("/bin/sleep", args);
    return 1;
}

Try it - on a Mac (after three tests):

make x; ./x & sleep 1; ps

The output on the third run was:

MiniMac JL: make x; ./x & sleep 1; ps
make: `x' is up to date.
[3] 5557
  PID TTY           TIME CMD
 5532 ttys000    0:00.04 -bash
 5549 ttys000    0:00.00 rip van winkle 30
 5553 ttys000    0:00.00 rip van winkle 30
 5557 ttys000    0:00.00 rip van winkle 30
MiniMac JL: 

EBM comments:

Yeah, and this makes it even more weird. In my test bash script (the target of the execve), I don't see the value of what execve has in arg[0] anywhere -- not in the environment, and not as $0.

Revising the experiment - a script called 'bash.script':

#!/bin/bash

echo "bash script at sleep (0: $0; *: $*)"
sleep 30

And a revised program:

int main(void)
{
    char *args[3] = { "rip van winkle", "30", 0 };
    execv("./bash.script", args);
    return 1;
}

This yields the ps output:

bash script at sleep (0: ./bash.script; *: 30)
  PID TTY           TIME CMD
 7804 ttys000    0:00.11 -bash
 7829 ttys000    0:00.00 /bin/bash ./bash.script 30
 7832 ttys000    0:00.00 sleep 30

There are two possibilities as I see it:

  1. The kernel juggles the command line when executing the script via the shebang ('#!/bin/bash') line, or
  2. Bash itself dinks with its argument list.

How to establish the difference? I suppose copying the shell to an alternative name, and then using that alternative name in the shebang would tell us something:

$ cp /bin/bash jiminy.cricket
$ sed "s%/bin/bash%$PWD/jiminy.cricket%" bash.script > tmp
$ mv tmp bash.script
$ chmod +w bash.script
$ ./x & sleep 1; ps
[1] 7851
bash script at sleep (0: ./bash.script; *: 30)
  PID TTY           TIME CMD
 7804 ttys000    0:00.12 -bash
 7851 ttys000    0:00.01 /Users/jleffler/tmp/soq/jiminy.cricket ./bash.script 30
 7854 ttys000    0:00.00 sleep 30
$

This, I think, indicates that the kernel rewrites argv[0] when the shebang mechanism is used.


Addressing the comment by nategoose:

MiniMac JL: pwd
/Users/jleffler/tmp/soq
MiniMac JL: cat al.c
#include <stdio.h>
int main(int argc, char **argv)
{
    while (*argv)
        puts(*argv++);
    return 0;
}
MiniMac JL: make al.c
cc     al.c   -o al
MiniMac JL: ./al a b 'c d' e
./al
a
b
c d
e 
MiniMac JL: cat bash.script
#!/Users/jleffler/tmp/soq/al

echo "bash script at sleep (0: $0; *: $*)"
sleep 30
MiniMac JL: ./x
/Users/jleffler/tmp/soq/al
./bash.script
30
MiniMac JL:

That shows that it is the shebang '#!/path/to/program' mechanism, rather than any program such as Bash, that adjusts the values of argv[0]. So, when a binary is executed, the value of argv[0] is not adjusted; when a script is executed via the shebang, the argument list is adjusted by the kernel; argv[0] is the binary listed on the shebang; if there is an argument after the shebang, that becomes argv[1]; the next argument is the name of the script file, followed by any remaining arguments from the execv() or equivalent call.

MiniMac JL: cat bash.script
#!/Users/jleffler/tmp/soq/al -arg0
#!/bin/bash
#!/Users/jleffler/tmp/soq/jiminy.cricket

echo "bash script at sleep (0: $0; *: $*)"
sleep 30
MiniMac JL: ./x
/Users/jleffler/tmp/soq/al
-arg0
./bash.script
30
MiniMac JL: 
Jonathan Leffler
Yeah, and this makes it even more weird. In my test bash script (the target of the execve), I don't see the value of what execve has in arg[0] anywhere -- not in the environment, and not as $0.
EBM
Make a script `#!/home/me/print_args` and write a simple print arg program to be sure who fiddles with `argv[0]`
nategoose
A: 

That allows a program to have many names and work slightly differently depending on using which name it was called.

Imaging trivial program, e.g. print0.c compiled into print0:

#include <stdio.h>
int main(int argc, char **argv)
{
   printf("%s\n",argv[0]);
   return 0;
}

Running it as ./print0 would print ./print0 Make a symbolic link e.g. print1 to it and now use name ./print1 to run it - it would print "./print1".

Now that was with a symlink. But with exec*() function, you can tell program its name explicitly.

Artifact from *NIX, but nice to have nevertheless.

Dummy00001
See my comment to Jonathan Leffler above - using a test bash script as the target of the execve call, arg[0] does NOT show up anywhere. Perhaps a C program gets its arguments differently, but at least for a bash script, this feature you explain seems absent.
EBM
Yes, shell's exec doesn't allow that. One would have to use a symlink to call program under different name.
Dummy00001