Jader, if you can run the app under an IDE and pause it at random, there is an unorthodox but very quick and effective way to find out what is responsible for the time. A few people know it, and I've tried to explain it at length. Here's my latest attempt. Good luck. Here's another explanation and an example.
ADDED: In your comment, you said that is just what sampling profilers do, only faster. Even if they actually sample the entire call stack, there are two big problems:
They summarize, often at the level of functions or methods, meaning that you have to hunt inside time-consuming functions to find the time-consuming statements. While you are hunting for that information, the stack samples actually have it but didn't show it to you. The rationale is that there are too many samples for you to look at, so they have to summarize. However, if some statement is on the call stack some percent of the time, like 50% (not unusual), then that is exactly what would be saved by its removal, and that is roughly the percent of samples that contain it. Ten samples would show it just as surely as 1000, so the extra samples are wasted, and the effort to include their information just causes obfuscation. Some profilers summarize at the level of statements, and that is an improvement, but it runs into the second issue:
You need to know the reason why the time-consuming code is being executed, and that information is not shown to you. The information is there, in the stack samples. If a statement is on the call stack on some fraction of the samples, just look at one or more of those samples. The call stack gives you the chain of justification of why the statement is being executed. Since you know why it's being done, you can tell if there is another way to accomplish the same thing that does not take so much time, such as using cached data from a prior invocation of that statement. Some profilers give you a call tree or a call graph, but those are summaries meaning you still have to puzzle out exactly why the statement is being called so much of the time.
I hope this gives you a sense of what I'm driving at. The general idea that you have to aggregate and measure in order to locate problems is a natural one, but it is far more effective to extract the full information contained in a small number of informative samples.
P.S. Don't think that recursion is an issue. Recursion manifests as a statement appearing more than once in some samples. It is still the case that the fraction of samples containing the statement is an estimate of its cost. (You might want to think about that for a minute.)
There are other common misconceptions about the technique (such as its only working on small programs), that are explored in the links above.