gcc does return optimization. In func1() and func2() it does not call func2()/func3() - instead of this, it jumps to func2()/func3(), so func3() can return immediately to main().
In your case, func1() and func2() do not need to setup a stack frame, but if they would do (e.g. for local variables), gcc still can do the optimization if the function call is the last instruction - it then cleans up the stack before the jump to func3().
Have a look at the generated assembler code to see it.
Edit/Update:
To verify that this is the reason, do something after the function call, that cannot be reordered by the compiler (e.g. using a return value). Or just try compiling with -O0.