If you are using a processor with a Memory Management Unit your hardware can do this for you with minimal software overhead. Most modern 32 bit processors have them and more and more 32 bit micro controllers feature them as well.
Set up a memory area in the MMU that will be used for the stack. It should be bordered by two memory areas where the MMU does not allow access. When your application is running you will receive a exception/interrupt as soon as you overflow the stack.
Because you get a exception at the moment the error occur you know exactly where in your application the stack went bad. You can look at the call stack to see exactly how you got to where you are. This makes it a lot easier to find your problem than trying to figure out what is wrong by detecting your problem long after it happened.
I have used this successfully on PPC and AVR32 processors. When you start out using an MMU you feel like it is a waste of time since you got along great without it for many years but once you see the advantages of a exception at the exact spot where your memory problem occur you will never go back. A MMU can also detect zero pointer accesses if you disallow memory access to the bottom park of your ram.
If you are using an RTOS your MMU protects the memory and stacks of other tasks errors in one task should not affect them. This means you could also easily restart your task without affecting the other tasks.
In addition to this a processor with a MMU usually also has lots of ram your program is a lot less likely to overflow your stack and you don't need to fine tune everything to get you application to run correctly with a small memory foot print.
An alternative to this would be to use the Processor debug facilities to cause a interrupt on a memory access to the end of your stack. This will probably be very processor specific.