I've found that the time profiling used by Shark is very accurate for determining what your bottlenecks are in your code. You can adjust the sampling interval to be more fine-grained by showing the Mini Config editor using Config | Show Mini Config Editor
and lowering the sample time.
Instruments in Xcode 3.2 also now has a nice Time Profiler instrument, although that's Mac-only. I've found that Instruments works well for profiling, but it can drop samples if the system is under heavy load. Generally, I start with Instruments, given how easy it is to use, then move on to Shark if I need a more detailed view of what's going on.
If you really want to do function-call-based profiling, I'd look at DTrace. I've written a couple of articles about tuning Cocoa applications using DTrace here and here. The latter one even shows an example of tuning the startup time of an iPhone application using a custom DTrace script.
Unfortunately, DTrace currently does not run on the iPhone itself, but you can still gather a lot of interesting information using it by running your application in the simulator. While the exact timing information will be nowhere near what it is on the device, knowing exactly what methods are executed how many times and in what order can give some clues as to where to optimize. I use DTrace to provide a different perspective on information gathered by Shark and Instruments, and to answer specific questions about my application.