views:

932

answers:

9

I am working with a multithreaded embedded application. Each thread is allocated stack sizes based on its functionality. Recently we found that one of the thread corrupted the stack by defining a array of local variables that was more than the stack size. The OS is uItron.

My solution, I registered a timer for 10 mS, and this timer will check for stack corruption.

Stack corruption checking method, 1. Initialize the stack memory with some unique pattern (I use 0x5A5A5A5A) 2. Check from the time if top of the stack memory is still 0x5A5A5A5A

My question,

Is there a better way to check this type of corruption

Forgot to add, adding now: OS : Itron, Processor : ARM9. Compiler : Is not GCC (ARM9 specific supplied by the processor vendor)... And there is no built in support for stack checking...

+1  A: 

Ideally valgrind would support your platform/OS. It's shocking to me that you don't get a separate vm memory region for each thread's stack. If there's any way to build your app so it can run on linux as well, you can probably reproduce the bug there and catch it with valgrind.

wrang-wrang
The platform I am working on does not support Valgrind by default, should I port it. But this embedded system has limited memory ...
Alphaneo
This seems like a reasonably common situation in an embedded situation, I wouldn't be shocked.
Justicle
That's actually not surprising at all - many ARM9 processors don't have MMUs by default, saves $$ and you don't really need it since you have no swapfile to back it
Paul Betts
@Paul It is quite common for a system to have an MMU but no swap file.
sigjuice
+5  A: 

What compiler are you using? I'm guessing a OS specific one. If you're using GCC, you may be able to use the Stack-Smashing Protector. This might be a fix for your production system prevent the issue, and would also allow you to detect it in development.

To effectively check for stack corruption, you need to check your available stack space, put guards on both sides of the stack arguments before the call, make the call, and then check the guards on the call's return. This kind of change generally requires modification to the code which the compiler generates.

brianegge
The compiler is not gcc ..
Alphaneo
I wonder if it is possible to write an ugly preprocessor hack, using naked function calls and enough assembly to follow platform call convention plus the guards and checks...
Eugene
@Eugene I'm pretty sure thats what the OP is asking :-)
Justicle
Note that if you're feeling particularly insidious, you can usually get GCC to generate intermediate assembly, tweak that a bit, and have your proprietary/closed assembler chew on that. I have done it before, since GCC's asm generation is leagues ahead of what I'm using in specific cases.
leander
A: 

Yes, valgrind, electricfence, etc.

Lee B
This is not going to help on an embedded platform
Gerhard
+2  A: 

As Lee mentions, your best bet might be to port Electric Fence to your ARM9 proprietary compiler. Failing that, the ARM ABI and stack format is well documented, so you could write a CHECK_STACK function that verifies that the return addresses point to functions, etc.

However, it's hard to truly write some of these checks though unless you're the compiler, so if you're not particularly tied to this compiler, GCC does support ARM and it also supports stack guards.

Paul Betts
+6  A: 

ARM9 has JTAG/ETM debugging support on-die; you should be able to set up a data access watchpoint covering e.g. 64 bytes near the top of your stacks, which would then trigger a data abort, which you could catch in your program or externally.

(The hardware I work with only supports 2 read/write watchpoints, not sure if that's a limitation of the on-chip stuff or the surrounding third-party debug kit.)

This document, which is an extremely low-level description of how to interface with the JTAG functionality, suggests you read your processor's Technical Reference Manual -- and I can vouch that there's a decent amount of higher-level info in chapter 9 ("Debug Support") for the ARM946E-S r1p1 TRM.

Before you dig into understanding all this stuff (unless you're just doing it for fun/education), double-check that the hardware and software you're using won't already manage breakpoints/watchpoints for you. The concept of "watchpoint" was a bit hard to find in the debugging software we use -- it was a tab labelled "Hardware" in the add breakpoint dialog.


Another alternative: your compiler may support a command-line option to add function calls at the entry and exit points of functions (some sort of "void enterFunc(const char * callingFunc)" and "void exitFunc(const char * callingFunc)"), for function cost profiling, more accurate stack tracing, or similar. You can then write these functions to check your stack canary value.

(As an aside, in our case we actually ignore the function name that is passed in (I wish I could get the linker to strip these) and just use the processor's link register (LR) value to record where we came from. We use this for getting accurate call traces as well as profiling information; checking the stack canaries at this point would be trivial too!)

The problem is, of course, that calling these functions changes the register and stack profiles for the functions a bit... Not much, in our experiments, but a bit. The performance implications are worse, and wherever there's a performance implication there's the chance of a behavior change in the program, which may mean you e.g. avoid triggering a deep-recursion case that you might have before...

leander
Feel free to suggest additions/corrections -- I've never set up a "debug monitor program" as the TRMs are describing here. I'm a little light on knowledge in this area, and the terminology isn't all solidly anchored yet.
leander
+4  A: 

When working on an embedded platform recently, I looked high and low for ways to do this (this was on an ARM7).

The suggested solution was what you've already come up with: initialize the stack with a known pattern and make sure that pattern exists after returning from a function. I thought the same thing "there's got to be a better way" and "hasn't someone automated this". The answer to both questions was "No" and I had to dig in just as you've done to try to find where the corruption was occuring.

I also "rolled my own" exception vectors for the data_abort, etc. There are some great examples on the 'net of how to backtrace the call stack. This is something you could do with a JTAG debugger, break when any of these abort vectors occurs and then investigate the stack. This can be useful if you only have 1 or 2 breakpoints (which seems to be the norm for ARM JTAG debugging).

Coleman
+1, Thanks for the data_abort tip, I actually did not use any exception handlers for my stack, and because of which, I had to poll~~
Alphaneo
+3  A: 

I have done exactly as you have suggested on dsPIC using CMX-Tiny+, however in the stack check I also maintain a 'hide-tide mark' for each stack. Rather than checking the value at the top of the stack, I iterate from the top to find the first non-signature value, and if this is higher than previously, I store it in a static variable. This is done in a lowest priority task so that it is performed whenever nothing else is scheduled (essentially replacing the idle-loop; in your RTOS you may be able to hook the idle loop and do it there). This means that it is typically checked more often than your 10ms periodic check; in that time the whole scheduler could be screwed.

My methodology is then to oversize the stacks, exercise the code, then check the high-tide marks to determine the margin for each task (and the ISR stack - don't forget that!), and adjust the stacks accordingly if I need to recover the 'wasted' space from the oversize stacks (I don't bother if the space is otherwise not needed).

The advantage of this approach is you don't wait until the stack is broken to detect a potential problem; you monitor it as you develop and as changes are checked in. This is useful since if the corruption hits a TCB or return address, your scheduler may be so broken the check never kicks in after an overflow.

Some RTOSes have this functionality built in (embOS, vxWorks that I know of). OS's that make use of MMU hardware may fare better by placing the stack in a protected memory space so an overflow causes a data abort. That is the 'better way' you seek perhaps; ARM9 has an MMU, but OS's that support it well tend to be more expensive. QNX Neutrino perhaps?

Additional note

If you don't want to do the high-tide checking manually, simply oversize the stacks by say 1K, and then in the stack-check task trap the condition when the margin drops below 1K. That way you are more likely to trap the error condition while the scheduler is still viable. Not fool proof, but if you start allocating objects large enough the blow the stack in one go, alarm bells should ring in your head in any case - its the more common slow stack creep caused by ever deeper function nesting and the like that this will help with.

Clifford.

Clifford
+1 for mentioning the ISR stack.
Robert
+1 for mentioning about the ISR task, as I forgot that completely. And also thanks for the idea to give extra stack space for debugging..
Alphaneo
+2  A: 

Do you have the kernel source? The last time I wrote a kernel, I added (as an option) stack checking in the kernel itself.

Whenever a context switch was going to occur, the kernel would check 2 stacks:

(1) The task being swapped out -->if the task blew its stack while it was running, let's know right now.

(2) The destination (target) task --> before we jump into the new task, let's make sure some wild code didn't clobber its stack. If its stack is corrupted, don't even switch into the task, we're screwed.

Theoretically the stacks of all tasks could be checked, but the above comments provide the rationale for why I checked these 2 stacks (configurable).

In addition to this, application code can monitor tasks (incl. the interrupt stack if you have one) in the idle loop, the tick ISR, etc...

Dan
+2  A: 

Checkout these similar questions: handling stack overflows in embedded systems and how can I visualise the memory sram usage of an avr program.

Personally I would use the Memory Management Unit of your Processor it it has one. It can do memory checking for you with minimal software overhead.

Set up a memory area in the MMU that will be used for the stack. It should be bordered by two memory areas where the MMU does not allow access. When your application is running you will receive a exception/interrupt as soon as you overflow the stack.

Because you get a exception at the moment the error occur you know exactly where in your application the stack went bad. You can look at the call stack to see exactly how you got to where you are. This makes it a lot easier to find your problem than trying to figure out what is wrong by detecting your problem long after it happened.

A MMU can also detect zero pointer accesses if you disallow memory access to the bottom part of your ram.

If you have the source of the RTOS you can build MMU protection of the stack and heap into it.

Gerhard