There might be (highly system depending) issues about cache locality and read/write misses. If you run your program on the stack and heap data, then it is conceivable (depending on your cache architecture) that you to run into more cache misses, than if you run it entirely on one continuos region of the stack. Here is the paper to this issue by Andrew Appel (from SML/NJ) and Zhong Shao where they investigate exactly this thing, because stack/heap allocation is a topic for the implementation of functional languages:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.3778
They found some performance problems with write misses but estimated these would be resolved by advances in caching.
So my guess for a contemporary desktop/server machine is that, unless you're running heavily optimized, architecture specific code which streams data along the cache lines, you won't notice any difference between stack and heap accesses. Things might be different for devices with small caches (like ARM/MIPS controller), where ignoring the cache can have noticeable performance effects anyway.