For typical performance benchmarking this is what i use.
- gprof/oprofile - for CPU intensive profiling of your code.
- netstat/ethereal - for network statistics
- iostat/sar - for I/O
- vmstat - for memory
- mpstat/sar - for cpu usage
Now u can isolate the problems based on the output of these tools.
For eg:- if I/O is constant and within limits u can eliminate I/O as a problem.
If CPU usage is heavy as shown my mpstat then get into profiling using gprof/oprofile.
Without the use of all of them together for different runs, its difficult to identify the bottleneck.
Note: U can write a script to run all of them together and store the results in designated folders for each run.