Suppose you have this program:
main() calls A() calls B() calls C(), and C hangs in a loop for 10 seconds.
The self time of C would be 10 seconds, 100%. The self time of the others would be essentially zero.
The total time of every one of them would be 10 seconds or 100%. You don't add those up.
Some profilers only count execution time, not I/O or other blocked time. If C spent its 10 seconds doing I/O rather than simply looping, then in such a profiler all the times would be zero, because the profiler is blind to I/O.
A better type of profiler is one that samples the call stack, on wall clock time, not cpu time, and tells you "total" (i.e. inclusive time) as a percent of total, and gives it to you at the line-of-code level, not just for functions. That's useful because it's a direct measure of how much could be saved if the line were executed less, and almost no problem can hide from it. Examples of such profilers are Zoom and LTProf, and I'm told OProfile can do it. There's a simple method that works with any language and requires only a debugger.
Here's a discussion of the issues.