When one uses a protected mode operating system (like Windows or Linux), each process has whole bunch of memory pages made available to the given process. If more memory is required, more can be paged in.
Typically the process divides the memory given to it into two parts. One is the heap and the other is the stack. The bottom of the stack is designated by the stack pointer r13 on arm and esp on x86. When one creates a variable on the stack the stack pointer is moved to allow for the extra space needed. This is done by the assembler instruction PUSH. Similarly when a variable is out of scope it is POPed off the stack.
Typically PUSH causes the stack pointer to be decremented leaving the value above the stack pointers value "on the stack".
The other portion of memory may be used for a heap. This is then available for allocation with the use of malloc or new. Each thread must have its own stack but may share the heap with other threads in the process.
When the kernel reschedules a thread, it stores the stack register and changes the stack register to the new stack. if may or may not need to store the program counter depending on the way is does scheduling.
The cache has nothing to do with either stack or heap. It is managed by the processor and provides a way to ensure that data needed by the CPU is close at hand so that it does not have to wait for the bus to fetch it. It is totally up to the CPU to ensure that what is in main memory is the same as what is stored in the cache. The only time one really needs to worry about cache is when using DMA. The one will have to manually flush or sync the cache to ensure that the CPU does not trust the cache and actually fetches data from main memory.