I think it's a good question. To answer it requires having a general framework for thinking about performance, so let me try to provide one. (Some of this is going to sound really obvious, but bear with me.)
To keep things simple,
let's just consider the simple case of applications that have a specific job to do, and that start, and then finish, and what you care about is wall-clock time. Let's assume a standard CPU cycle rate, and a mono-processor.
The time duration consists of a stream of time-slices (nanoseconds, say). To do that job, there is a minimum amount of time required, and it is usually greater than zero. There is no maximum amount of time required. If a program spends longer than the minimum number of nanoseconds, then some of those nanoseconds are being spent, strictly speaking, unnecessarily (i.e. for poor reasons).
So, to optimize a program's execution time, it is necessary to find the nanoseconds it is spending that do not have to be spent (i.e. that do not have good reasons) and remove them.
One way to do this is to, if possible, step through the program and keep track at each step of why it is doing that step. If the reason is not good, there is an opportunity for removing steps.
Another way to do this is to select nanoseconds at random from the program's execution, and inquire their reasons. For example, the program counter can tell you what the program is doing, but the call stack can tell you why. In order for the nanosecond to be spent for a good reason, every call instruction on the call stack has to have a good reason. If any instruction on the call stack does not have a good reason, then there is an opportunity to optimize. In fact, the amount of time that instruction is on the call stack is the amount of time that would be saved by its removal.
In some kinds of software that are highly asynchronous, message-driven, or interpreted, the call stack may not provide enough information. In that case, to answer why a given nanosecond is being spent may be more difficult. It may require examining more state information than just the call stack. For example, in an interpreter, the stack of the program being interpreted may also need to be examined. However, often the hardware call stack does provide sufficient information, so it is a useful thing to examine.
Now, to try to answer your question.
There is such a thing as a "hot spot". This is a small set of addresses that are often at the bottom of the call stack. Nanoseconds spent in that code may or may not have good reasons.
There is such a thing as a "performance problem". This is an instruction that often accounts for why nanoseconds are being spent, but that does not have a good reason. Such an instruction may be in a hot spot. It may also be a subroutine call instruction. (It cannot be both.) It may be an instruction to send a message to be processed later, that does not have a good reason for being spent. To optimize software, such instructions (not functions) are what are being looked for.
Languages, loosely speaking, are either compiled into machine language or interpreted. Interpreted languages are usually 1 or 2 orders of magnitude slower than compiled, because they are constantly re-determining what they need to do. However, roughly speaking, this is only a performance problem if it occurs in a hot spot. If a program spends all its time calling compiled library functions, or waiting for I/O completions, then its speed of execution probably doesn't matter, because most of the nanoseconds are being spent for other reasons.
Now, certainly, any language or program can in principle be highly non-optimal, but in terms of compilers, for hotspot code, they are mostly pretty good, give or take maybe 30%. If there is a background process involved, like garbage collection, that adds an overhead, but it depends on the rate at which the program generates garbage.
So to sum up, the speed of a language matters in hotspot code, but not much elsewhere. When a program has been optimized by removal of all other performance problems, and if the hotspot code is actually seen by the compiler/interpreter, then speed of language matters.