From source code, you can use Doxygen and GraphViz even if you don't already use Doxygen to document your code. It is possible to configure it so that it will include all functions and methods whether or not they have documentation comments. With AT&T Graphviz installed, Doxygen will include call and caller graphs for most functions and methods.
From object code, I don't have a ready answer. I would imagine that this would be highly target dependent since even with the debug information present, it would have to parse the object code to find calls and other stack users. In the worst case, that approach seems like it would require effectively simulating the target.
At runtime on the target hardware your choices are going to depend in part on what kind of embedded OS is present, and how it manages stacks for each thread.
A common approach is to initialize each stack to a known value that seems unlikely to be commonly stored in automatic variables. An interrupt handler or a thread can then inspect the stack(s) and measure an approximate high water mark.
Even without pre-filling the stack and later walking it to look for footprints, an interrupt could just sample the current value of the stack pointer (for each thread) and keep a record of its greatest observed extent. That would require storage for a copy of each threads SP, and the interrupt handler wouldn't have very much work to do to maintain the information. It would have to access the saved states of all the active threads, of course.
I don't know of a tool that does this explicitly.
If you happen to be using µC/OS-II from Micrium as your OS, you might take a look at their µC/Probe product. I haven't used it myself, but it claims to allow a connected PC to observe program and OS state information in near real time. I wouldn't be surprised if it is adaptable to another RTOS if needed.