+2  A: 

gcc does return optimization. In func1() and func2() it does not call func2()/func3() - instead of this, it jumps to func2()/func3(), so func3() can return immediately to main().

In your case, func1() and func2() do not need to setup a stack frame, but if they would do (e.g. for local variables), gcc still can do the optimization if the function call is the last instruction - it then cleans up the stack before the jump to func3().

Have a look at the generated assembler code to see it.


Edit/Update:

To verify that this is the reason, do something after the function call, that cannot be reordered by the compiler (e.g. using a return value). Or just try compiling with -O0.

IanH
Downvoted, he said he checked the assembler.
DeadMG
He says the functions are there (not inlined), but he did not say if he has checked if the functions are called or jumped to.
IanH
@DeadMG: The downvote is certainly harsh. Tail calls are usually optimised like this when compiling for ARM, and this optimisation would give exactly the observed results.
Mike Seymour
The OP specifically said he checked the disassembler.
DeadMG
@DeadMG: He said that he checked that the functions were called rather than inlined, but he may have missed the functions ending with a branch rather than a return. It's not something you'd notice unless you carefully read every instruction. Of course, your votes are yours to deal out as you see fit.
Mike Seymour
@DeadMG: Even with a a look at the disassembly, if you don't know about this optimization you easily can oversee if there is a call or jump.I still think this is the problem here - the other answer is interesting, but it does not explain why there is only func3() and main() in the backtrace. (and not func3() and func2() only).
IanH
To clarify: the simplified toy code in the original post could have done return call/jump optimization, but in the actual code, there are things on both sides of the call that could not (and I have verified that they are not) being optimized away. There is a push/pop at the start and end of each function, and the next function in the chain is called with a blx instruction (Thumb2).
hugov
+3  A: 

Since ARM platforms do not use a frame pointer, you never quite know how big the stackframe is and cannot simply roll out the stack beyond the single return value in R14.

When investigating a crash for which we do not have debug symbols, we simply dump the whole stack and lookup the closest symbol to each item in the instruction range. It does generate a load of false positives but can still be very useful for investigating crashes.

If you are running pure ELF executables, you can separate debug symbols out of your release executable. gdb can then help you find out what is going on from your standard unix core dump

doron
+1 we've done something similar on MIPS
bstpierre
You could reduce the false positives by using the disassembled executable to manually reconstruct the stack frames; look at the first few instructions of each function to count the stacked registers, and any further adjustments to the stack pointer.
Mike Seymour
Nitpick: some ARM platforms do use a frame pointer (usually `r11`). But that's not important here, since the questioner states that his platform doesn't.
Mike Seymour
Mike: yes I could do that (myself)... but surely there is some code or library I can leverage that already does it?!Surely in the context of exceptions, every possible stack frame has to contain the necessary metadata (at a minimum, the size) to unwind up the stack. Thus, given exception handling works, why can't gcc's own unwinder do this for me?
hugov
@hugov: exception handling needs to know which objects to destroy, where to jump to, and what state to restore the stack to. It doesn't need to know the complete call stack, so I wouldn't expect to be able to reconstruct a complete stack trace unless the compiler specifically chooses to support this. From your experience, I'm guessing it doesn't, but I could be wrong.
Mike Seymour
@Mike Seymour - Technically ARM assembler does not even have the concept of a stack built into it. The closest we come is the LDM and STM instructions. So you are free to implement a stack any way you like. The ARM Procedure Call which is used for most standard ARM ABIs does not support a frame pointer but there is nothing other than compatibility that will stop you from using a frame pointer.
doron
@deus: Indeed, although Thumb has `push` and `pop` instructions which assume a full-descending stack with `r13` as the stack pointer, so the concept of a stack has slipped into assembly there. The current ABI doesn't have a concept of a frame pointer, but older ones had variants that did, to allow unwinding in the days when debugging information couldn't be relied on for that.
Mike Seymour
@Mike, see updated OP. Very curious!
hugov
A: 

Does your executable contain debugging information, from compiling with the -g option? I think this is required to get a full stack trace without a frame pointer.

You might need -gdwarf-2 to make sure it uses a format that includes unwind information.

Mike Seymour
Possible, although I'm pretty sure (like 99.9%) that the DWARF info doesn't actually make it into the binary image programmed into flash. How would I check?
hugov