Shark on Mac OS X is a great tool for profiling an application on a running system. Is there any similar tools for Linux?
OProfile looks like it could be, anyone used it?
You can probably try Valgrind (http://valgrind.org/). They have both runtime and compile time profiling tools.
Extending another answer, I use the 'callgrind' option of valgrind (http://valgrind.org). Then install kcachegrind from KDE for a nice GUI interface.
As a dummy's tutorial, do:
1) Compile your application with debugging information. It's a good idea to try profiling with optimisation both on and off, with optimisation off you will get more information, but it may be less accurate (in particular tiny functions will seem to take up more time than they deserve.
2) Run with:
valgrind --tool=cachegrid <name of your app> <your app's options>
This should produce a file called 'callgrind.something', which you can load into kcachegrind.
You can also look at:
valgrind --tool=callgrind
Which will give you information about how your app is interacting with your CPU's cache.
Note that while valgrind and shark seem like similar apps, they work very differently. When you run an app in valgrind it will run many times slower than normal (often over 40 times slower), but the results you get are much more accurate than shark's. I tend to use both, so I can get as much information as possible!
OProfile is a tool that does sampling-based profiling of both your application and the system calls it makes. This allows for seeing detailed information about where it's spending time. It doesn't have a GUI, but there are several front-ends that will let you process the information from the runs.
I've used it extensively, both for desktop applications and for embedded systems. It takes a little effort to interpret the results, but the callgraph output is really useful here.