views:

1080

answers:

4

This is a fundamental question, but an important one none the less...

When starting a C++ program whose main method has the following common signature:

int main(int argc, char* args[]) {
    //Magic!
    return 0;
}

is args[0] always guaranteed to be the path to the currently running program? What about cross platform (since I am in a Linux environment but may port later on.)?

+18  A: 

It is not always. It's the value that you gave the program by the Operation System. For example when starting a program using exec you can set that to an arbitrary value:

int execve(const char *filename, char *const argv[],
           char *const envp[]);

The first parameter is the file to start, and argv will contains argv[0] and all other parameters for main. envp contains the environment variables (not defined by Standard C or C++. This is a posix thing).

More precisely, this is the definition of argv in C++:

An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a return type of type int, but otherwise its type is implementation-defined. All implementations shall allow both of the following definitions of main:

int main() { /* ... */ }

and

int main(int argc, char* argv[]) { /* ... */ }

In the latter form argc shall be the number of arguments passed to the program from the environment in which the program is run. If argc is nonzero these arguments shall be supplied in argv[0] through argv[argc-1] as pointers to the initial characters of null-terminated multibyte strings (NTMBSs) (17.3.2.1.3.2) and argv[0] shall be the pointer to the initial character of a NTMBS that represents the name used to invoke the program or "". The value of argc shall be nonnegative. The value of argv[argc] shall be 0. [Note: it is recommended that any further (optional) parameters be added after argv. ]

It's pretty much up to the implementation what defines a "name used to invoke the program". If you want to get the full path of your executable, you can use GetModuleFileName on Windows, and argv[0] (for getting the name used to execute, may be relative) together with getcwd (for getting the current working directory, trying to make the name absolute).

Johannes Schaub - litb
For example, a shell knows it's a login shell because login(1) arranges for argv[0] to begin with a dash.
Norman Ramsey
... and bash knows it's in POSIX mode by comparing argv[0] to "sh" :) incidentally, this argv trickery is what makes busybox so small. all utils are packed together in one binary which is just executed with different argv[0] values
Johannes Schaub - litb
I'd avoid realpath in portable programs. The POSIX realpath interface is broken, because you can't specify the size of the target buffer, nor allocate one big enough without making assumptions about PATH_MAX.
Doug
Doug, oh right. i just read it in the manpage.
Johannes Schaub - litb
+1  A: 

Here's what the C standard says that argv[0] should be:

If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment.

As for whether it contains the full path, the answer is that argv[0] does not necessarily contain the full path to the executable. On Windows it seems to be exactly what was provided on the command line. Dunno what Linux/Unix does.

Michael Burr
+4  A: 

No. On Windows GetModuleFileName gurantees the exact full path to the current executing program. On linux there is a symlink /proc/self/exe. Do a readlink on this symlink to get the full path of the currently executing program. Even if youprogram was called thorugh a symlink /proc/self/exe will always point to the actuall program.

Tim Matthews
+1  A: 

It's so not-guaranteed that students used to try to hide the fact that they were playing Rogue on the school mainframe by writing C programs that would start it with argv[0] of "cc" or "tcsh".

chaos
With old enough versions of 'ps' and 'w' it even worked. :)
chaos