tags:

views:

201

answers:

6

Consider the very simple:

int main(void) {
    return 0;
}

I compiled it (with mingw32-gcc) and executed it as main.exe foo bar.

Now, I had expected some sort of crash or error caused by a main function explicitly declared as being bereft of life parameters. The lack of errors led to this question, which is really four questions.

  • Why does this work? Answer: Because the standard says so!

  • Are the input parameters just ignored or is the stack prepared with argc & argv silently? Answer: In this particular case, the stack is prepared.

  • How do I verify the above? Answer: See rascher's answer.

  • Is this platform dependant? Answer: Yes, and no.

+2  A: 

In most compilers, __argc and __argv exist as global variables from the runtime library. The values will be correct.

On windows, they won't be correct if the entry point has UTF-16 signature, which is also the only way of getting the right command arguments on that platform. They will be empty in that case, but this is not your case, and there're two widechar alternative variables.

Pavel Radzivilovsky
But are they also pushed onto the stack prior to the call to main?
manneorama
totally platform dependent. afaik on windows they won't. in most cases, main is not the entry point of the program. also, you cannot really tell apart stack variables before main and data segment global vars. why is the question?
Pavel Radzivilovsky
+3  A: 

In classic C, you can do something similar:

void f() {}

f(5, 6);

There is nothing stopping you from calling a function with a different number of parameters as its definition assumes. (Modern compilers, naturally, consider this an egregious error and will strongly resist actually compiling the code.)

The same thing happens with your main() function. The C runtime library will call

main(argc, argv);

but the fact that your function is not prepared to receive those two arguments is of no consequence to the caller.

Greg Hewgill
+6  A: 

From the C99 standard:

5.1.2.2.1 Program startup

The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:

int main(void) { /* ... */ }

or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):

int main(int argc, char *argv[]) { /* ... */ }

or equivalent; or in some other implementation-defined manner.

Tim Schaeffer
I was hoping for a quote from the standard. That's one down. Thank you.
manneorama
it says basically that `int main(void)` and `int main()` are both good; see my answer
ShinTakezou
+1  A: 
  1. Why it works: Generally, function arguments are passed in specific places (registers or stack, usually). A function without arguments will never check them, so their contents are irrelevant. This depends on calling and naming conventions, but see #4.

  2. The stack will typically be prepared. On platforms where argv is parsed by the runtime library, such as DOS, the compiler may choose not to link in the code if nothing uses argv, but that is complexity few deem necessary. On other platforms, argv is prepared by exec() before your program is even loaded.

  3. Platform dependent, but on Linux systems, for instance, you can in fact examine the argv contents in /proc/PID/cmdline whether or not they're used. Many platforms also provide separate calls to find arguments.

  4. As per the standard quoted by Tim Schaeffer, main does not need to accept the arguments. On most platforms, the arguments themselves will still exist, but a main() without arguments will never know of them.

Yann Vernier
+8  A: 

I don't know the cross-platform answer to your question. But this made me curious. So what do we do? Look at the stack!

For the first iteration:

test.c

int main(void) {
   return 0;
}

test2.c

int main(int argc, char *argv[]) {
   return 0;
}

And now look at the assembly output:

$ gcc -S -o test.s test.c 
$ cat test.s 
        .file   "test.c"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        movl    $0, %eax
        popl    %ebp
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
        .section        .note.GNU-stack,"",@progbits

Nothing exciting here. Except for one thing: both C programs have the same assembly output!

This basically makes sense; we never really have to push/pop anything off of the stack for main(), since it's the first thing on the call stack.

So then I wrote this program:

int main(int argc, char *argv[]) {
   return argc;
}

And its asm:

main:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        popl    %ebp
        ret

This tells us that "argc" is located at 8(%ebp)

So now for two more C programs:

int main(int argc, char *argv[]) {
__asm__("movl    8(%ebp), %eax\n\t"
        "popl    %ebp\n\t"
        "ret");
        /*return argc;*/
}


int main(void) {
__asm__("movl    8(%ebp), %eax\n\t"
        "popl    %ebp\n\t"
        "ret");
        /*return argc;*/
}

We've stolen the "return argc" code from above and pasted it into the asm of these two programs. When we compile and run these, and then invoke echo $? (which echos the return value of the previous process) we get the "right" answer. So when I run "./test a b c d" then $? gives me "5" for both programs - even though only one has argc/argv defined. This tells me that, on my platform, argc is for sure placed on the stack. I'd bet that a similar test would confirm this for argv.

Try this on windows!

rascher
if they are present or not it depends on the startup code, not by how main is defined. Theoretically compilers can choose which startup code to give according to the definition, in this case I would create a new expression, "reversed overloading" on the main func (the caller is changed according to how callee is "defined"). It does not happen on every test I was able to do: startup code is always the same regardless of how main is defined. And this is why I prefer the usage of `int main()` for systems where the startup code pass two args (argc and argv), instead of `int main(void)`.
ShinTakezou
+1  A: 

There are some notes to do.

The standard basically says what most likely main is: a function taking no arguments, or a function taking two arguments, or whatever else!

See for example my answer to this question.

But your question points to other facts.

Why does this work? Answer: Because the standard says so!

It is not correct. It works for other reasons. It works because of the calling conventions.

These convention can be: arguments are pushed on stack, and the caller is responsible for cleaning the stack. Because of this, in actual asm code, the callee can totally ignore what is on the stack. A call looks like

   push value1
   push value2
   call function
   add esp, 8

(intel examples, just to stay in the mainstream).

What function does with the arguments pushed on stack, is totally uninteresting, everything will still work fine! And this is indeed true even if the calling convention are different, e.g.

   li  $a0, value
   li  $a1, value
   jal function

If function takes into account the registers $a0 and $a1 or not, does not change anything.

So callee can ignore without harms arguments, cn believe they do not exist, or it can know they exist, but prefer to ignore them (on the contrary, it would be problematic if the callee gets values from the stack or registers, while the caller passed nothing).

This is why things work.

From the C point of view, if we are on a system where the startup code calls the main with two arguments (int and char **) and expect an int return value, the "right" prototype would be

 int main(int argc, char **argv) { }

But let us suppose now that we do not use these arguments.

It is more correct to say int main(void) or int main() (still in the same system where the implementation calls the main with two args and expect an int return value, as said before)?

Indeed standard does not say what we have to do. The correct "prototype" that says that we have two arguments is still the one shown before.

But from a logical point of view, the right way of saying that there are arguments (we know it) but we are not interested in them, is

 int main() { /* ... */ }

In this answer I've shown what it happens if we pass arguments to a function declared as int func() and what happens if we pass arguments to a function declared as int func(void).

In the second case we have an error since (void) explicitly says the function has no arguments.

With main we can't get an error since we have no a real prototype mandating for arguments, but it is worth noting that gcc -std=c99 -pedantic gives no warning for int main() nor for int main(void), and this would mean that 1) gcc is not C99 compliant even with the std flag, or 2) both ways are standard compliant. More likely it is the option 2.

One is explicitly standard compliant (int main(void)), the other is indeed int main(int argc, char **argv), but without explicitly saying the arguments, since we are not interested in them.

int main(void) works even when arguments exist, because of what I've written before. But it states that main takes no argument. While in many cases, if we can write int main(int argc, char **argv), then it is false, and int main() must be preferred instead.

Another interesting thing to notice is that if we say main does not return a value (void main()) on a system where the implementation expects a return value, we obtain a warning. This is because the caller expect it to do something with it, so that it is "undefined behaviour" if we do not return a value (which it does not mean putting an explicit return in the main case, but declaring main as returning an int).

In many startup codes I've seen the main is called in one of these ways:

  retval = main(_argc, _argv);
  retval = main(_argc, _argv, environ);
  retval = main(_argc, _argv, environ, apple); // apple specific stuff

But there can exist startup codes that calls main differently, e.g. retval = main(); in this case, to show this, we can use int main(void), and on the other hand, using int main(int argc, char **argv) would compile, but make the program crash if we actually use the arguments (since the retrieved values will be rubbish).

Is this platform dependant?

The way the main is called is platform dependent (implementation specific), as allowed by standards. The "supposed" main prototype is a conseguence and as already said, if we know there are arguments passed in but we shall not use them, we should use int main(), as a short-form for longer int main(int argc, char **argv), while int main(void) means something different: i.e. main takes no arguments (that is false in the system we are thinking about)

ShinTakezou
int main() is an artifact from the pre-ANSI days. ANSI added prototypes for a reason: to provide type-checking, and while it may not be that important in main(), it sure is in other functions. Try adding -Wstrict-prototypes to your compile line and see what GCC says.
ninjalj
first of all, adding options to say the compiler should warn on a situation, says nothing about it being correct;if it'd have been no standard,gcc would have complained, with or without `-Wwhatever`. They are just an help to kill/avoid bugs.If we would have a prototype for main explicitly given,we should write __always__ `int main(void)` or `int main(int argc, char **argv)` according to that proto.Since on system passing two args the proto would be the latter, we should write it always that way,even if we are not interested in argc/argv.>>>
ShinTakezou
>> When we know that the system passes in 2 args,we should use always `int main(int argc, char **argv)`,not `int main(void)`,to match what is really going on; `int main(void)` would mean the "system" does not pass any args,which is false, unless compiler implements a sort of "reversed overloading" on the main func. Which most of the time does not happen, and argc/argv are passed anyway,if the "implementation" allows for them. `int main(void)` would work anyway,in the vast majority of cases, basically since there are no pascal-like calling convention;in this case, `int main()` would fail too>>
ShinTakezou
>> and the only usable option would be `int main(int argc, char **argv)` always. (in this case, I meant: if pascal-like calling convention would be in act... in this case, the callee must match exactly how it is called, since it is responsible to "clean up" the stack according to how many args are passed, and the only way compiler can know, is through proto; in this case, all this debate would have stopped before)
ShinTakezou