I've used two profiling tools (VTune on windows and dbx (within sunstudio) on Solaris) which can profile program without rebuild them, and during profiling, the program just run at the same speed as normal. Both of these 2 features saved me a lot of time.
Now I want to know if there is some free tools available on Linux platform can do the same thing. I think I need profiling tools based on sampling. VTune is good but expensive ... I've heard of gprof and valgrind. But seems gprof need instrument the program (so we have to rebuild the program) and valgrind will slow down the program execution quite a lot. (from valgrind's introduction, Cachegrind runs programs about 20--100x slower than normal, and Callgrind which I need to profiling is based on Cachegrind)
For profiling, I just need to figure out the execution time of function calls so I can find out where the performance degradation happens. Actually I don't need many low level profiling information as Cachegrind provided...