So basically the question is how to locate the bottleneck(s). Profiling is one approach, and it usually works fine if used correctly. Focus on highest time inclusive stack to find which component is the problem, focus on time exclusive to locate problem functions. Also take a look at number of executions, you often find surprises there in small functions that are executed way too often.
Another approach is performance counters, and this is my favorite because is far less invasive than profiling, and can be used and re-used again and again in production deployment. I add counter generously to my code (see Using XSLT to generate Performance Counters code to avoid typing a tonne of repetitive code) and instrument the code with calls to IncrementXXX. Incrementing a performance counter is so cheap that the code can be left in Release production code, and this is why I prefer this method. for instance say I have a piece of code that makes a database call, then a web service request, then another database call. Overall is slow, but where? I can instrument the code like this:
void MyFunction()
{
CountersManager.IncrementMyFunction(1);
CountersManager.IncrementFirstDBCall(1);
dataAccess.FirstCall();
CountersManager.IncrementFirstDBCall(-1);
CountersManager.IncrementWebCall(1);
webRequest.MakeCall();
CountersManager.IncrementWebCall(-1);
someCode.. moreCode;
CountersManager.IncrementSecondDBCall(1);
dataAccess.SecondCall();
CountersManager.IncrementSecondDBCall(-1);
CountersManager.IncrementMyFunction(-1);
}
This is the simplest instrumentation, its trivial to add, and although the information it provides it's minimal, it can give insight into what happens. Say I collect performance counters and I find that most samples the MyFunction counter is 500, the FirstDBCall counter is 50, the WebService counter is 300 and the SecondDB call is 50. That tells me that in average, from the 500 calls that are in MyFunction, 300 were captured during the web call, so that is where the most time is spent. But I can go further, add counters that measure time occured (increment a counter of type AverageTimer32 and an AverageBase one) and the profiling will give me the average duration of the operation. And so on and so forth.
And last but not least, you can look at all the existing counters in the products you use. The drawback is that understand where to look and how to interpret the number you see requires knowledge of the said counters, or access to good troubleshooting tips. There are plenty for products like SQL Server for instance, but other products from the stack you use may be more difficult to find a good documentation on how this topic.