I'd say it's not really an issue with compilers as it is with CPUs. Compilers have to work with the target architecture.
Here's what the other answers are glossing over: it depends on the architecture of the CPU at the level of the actual circuitry. Machine instructions boil down to get data from somewhere, modify the data, load or goto the next instruction.
Analogy
Think of the problem like a woodworker working on building or repairing a chair for you. His questions will be "Where is the chair", and, "What needs to be done to the chair". He might be able to fix it at your house or he might need to take the chair back to his shop to work on it. Either way will work but depends on how prepared he is to work outside of a fixed location. It could slow him down or it could be his specialty.
Now, back to the CPU.
Explanation
Regardless of how parallel a CPU may be, like having several adders or instruction decode pipelines, those circuits are located in specific locations on the chip and the data must be loaded into the places where the operation can be performed. The program is responsible for moving the data into and out of those locations. In a stack-based machine, it might provide instructions that modify data directly but it may be doing housekeeping in the microcode. An adder works the same way regardless whether the data came from the stack or from the heap. The difference is in the programming model available to the programmer. Registers are basically a defined place to work on data.