views:

276

answers:

7

Hi

We are currently developing an application for a msp430 MCU, and are running into some weird problems. We discovered that declaring arrays withing a scope after declaration of "normal" variables, sometimes causes what seems to be undefined behavior. Like this:

foo(int a, int *b);

int main(void)
{
    int x = 2;
    int arr[5];

    foo(x, arr);

    return 0;
}

foo is passed a pointer as the second variable, that sometimes does not point to the arr array. We verify this by single stepping through the program, and see that the value of the arr array-as-a-pointer variable in the main scope is not the same as the value of the b pointer variable in the foo scope. And no, this is not really reproduceable, we have just observed this behavior once in a while.

This is observable even before a single line of the foo function is executed, the passed pointer parameter (b) is simply not pointing to the address that arr is.

Changing the example seems to solve the problem, like this:

foo(int a, int *b);

int main(void)
{
    int arr[5];
    int x = 2;

    foo(x, arr);

    return 0;
}

Does anybody have any input or hints as to why we experience this behavior? Or similar experiences? The MSP430 programming guide specifies that code should conform to the ANSI C89 spec. and so I was wondering if it says that arrays has to be declared before non-array variables?

Any input on this would be appreciated.


Update

@Adam Shiemke and tomlogic:

I'm wondering what C89 specifies about different ways of initializing values within declarations. Are you allowed to write something like:

int bar(void)
{
    int x = 2;
    int y;

    foo(x);
}

And if so, what about:

int bar(int z)
{
    int x = z;
    int y;

    foo(x);
}

Is that allowed? I assume the following must be illegal C89:

int bar(void)
{
    int x = baz();
    int y;

    foo(x);
}

Thanks in advance.


Update 2 Problem solved. Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.

If anybody is interested I can post the complete example reproducing the problem, and the fix?

Thanks for all the input on this.

+2  A: 

Both examples look to be conforming C89 to me. There should be no observable difference in behaviour assuming that foo isn't accessing beyond the bounds of the array.

Charles Bailey
I think that last phrase may be the essential one here...
Thomas
It is not accessing beyond the array. In the first example the pointer variable b in the foo function scope is literally not pointing to the same address as the array-as-a-pointer variable arr in the main function. This is clearly not correct behavior, so I am curious if anybody else have seen this before, and how it might be related to the order of variable declarations.
bjarkef
@bjarkef: It shouldn't. It *might* affect the order that the variables are allocated in on the stack but that shouldn't make any difference. If `foo` isn't causing any undefined behaviour in any other way then you must have an compiler, implementation or hardware issue.
Charles Bailey
It should be possible to create a reproducible test case for the problem, too - even if buggy, the compiler should at least be deterministic.
caf
I might try to create a simple reproducible test case at some point, but for now I'm mostly interested in hearing if anybody else have been dealing with a problem with the same symptoms, and what the cause was then?
bjarkef
I don't have experience with the MSP430 compiler, but on the off chance that it does happen again, take a look at the generated list file (assuming you have one) and/or memory map to see if the generated code is passing the correct address or not.
tomlogic
+1  A: 

Maybe you have at some place in your program in illegal memory write which corrupts your stack.

Did you have a look at the disassembly?

codymanix
+2  A: 

For C89, the variables need to be declared in a list at the start of the scope prior to any assignment. C99 allows you to mix assignment an declaration. So:

{ 
    int x; 
    int arr[5];

    x=5;
...

is legal c89 style. I'm surprised your compiler didn't throw some sort of error on that if it doesn't support c99.

Adam Shiemke
C89/C90 allows for variable initializers with the declarations. You can't mix declarations and code though.It would be interesting to see if the problem went away without the initializer -- could be a compiler error related to using that feature.
tomlogic
Hi. Thanks for the input on the C89 spec., please see updated question. I'm a bit unsure about what exactly is allowed under C89 regarding variable initialization.
bjarkef
+3  A: 

That looks like a compiler bug.

If you use your first example (the problematic one) and write your function call as foo(x, &arr[0]);, do you see the same results? What about if you initialize the array like int arr[5] = {0};? Neither of these should change anything, but if they do it would hint at a compiler bug.

bta
+2  A: 

You should be able to determine if it is a compiler bug based on the assembly code that is produced. Is the assembly different when you change the order of the variable declarations? If your debugger allows you, try single stepping through the assembly.

If you do find a compiler bug, also, check your optimization. I have seen bugs like this introduced by the optimizer.

semaj
+2  A: 

Assuming the real code is much more complex, heres some things i would check, keep in mind they are guesses:

Could you be overflowing the stack on occasion? If so could this be some artifact of "stack defense" by the compiler/uC? Does the incorrect value of &foo fall inside a predictable memory range? if so does that range have any significance (inside the stack, etc)?

Does the mcu430 have different ranges for ram and rom addressing? That is, is the address space for ram 16bit while the program address space 24bit? PIC's have such an architecture for example. If so it would be feasible that arr is getting allocated as rom (24bit) and the function expects a pointer to ram (16bit) the code would work when the arr was allocated in the first 16bit's of address space but brick if its above that range.

Mark
Definitely, the first thing I would check is stack corruption. This could be the classic stack overflow, but also a runaway pointer corrupting the stack.
Miro
+1  A: 

In your updated question:

Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.

It sounds as if the interrupt disabling intrinsic/function/macro (or however interrupts are disabled) might be causing an instruction to be 'skipped' or something. I'd investigate whether it is coded/working correctly.

Michael Burr
Try to look at the errata sheet of the MCU (http://focus.ti.com/docs/prod/folders/print/msp430f5438.html). It is filled with conditions that might corrupt the PC, and the workaround is to insert NOP instructions after the affected conditions. I'm considering just always inserting NOP instructions after I do anything involving interrupts or low-power mode.
bjarkef
@bjarkef: after a quick glance at the errata, it sure looks like your workaround might well be the necessary fix. I guess that I've been fortunate in dealing with CPUs that seem to have fewer uncertainties in how the program counter is handled in branches and interrupt handling. Yikes!
Michael Burr