views:

397

answers:

3

I've got some C code I'm targeting for an AVR. The code is being compiled with avr-gcc, basically the gnu compiler with the right backend.

What I'm trying to do is create a callback mechanism in one of my event/interrupt driven libraries, but I seem to be having some trouble keeping the value of the function pointer.

To start, I have a static library. It has a header file (twi_master_driver.h) that looks like this:

#ifndef TWI_MASTER_DRIVER_H_
#define TWI_MASTER_DRIVER_H_

#define TWI_INPUT_QUEUE_SIZE 256

// define callback function pointer signature
typedef void (*twi_slave_callback_t)(uint8_t*, uint16_t);

typedef struct {
    uint8_t buffer[TWI_INPUT_QUEUE_SIZE];
    volatile uint16_t length; // currently used bytes in the buffer
    twi_slave_callback_t slave_callback;
} twi_global_slave_t;

typedef struct {
    uint8_t slave_address;
    volatile twi_global_slave_t slave;
} twi_global_t;

void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback);

#endif

Now the C file (twi_driver.c):

#include <stdint.h>
#include "twi_master_driver.h"

void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback)
{
    twi->slave.length = 0;
    twi->slave.slave_callback = slave_callback;

    twi->slave_address = slave_address;

    // temporary workaround <- why does this work??
    twi->slave.slave_callback = twi->slave.slave_callback;
}

void twi_slave_interrupt_handler(twi_global_t *twi)
{
    (twi->slave.slave_callback)(twi->slave.buffer, twi->slave.length);

    // some other stuff (nothing touches twi->slave.slave_callback)
}

Then I build those two files into a static library (.a) and construct my main program (main.c) #include #include #include #include #include "twi_master_driver.h"

//  ...define microcontroller safe way for mystdout ...

twi_global_t bus_a;

ISR(TWIC_TWIS_vect, ISR_NOBLOCK)
{
    twi_slave_interrupt_handler(&bus_a);
}

void my_callback(uint8_t *buf, uint16_t len)
{
    uint8_t i;

    fprintf(&mystdout, "C: ");
    for(i = 0; i < length; i++)
    {
        fprintf(&mystdout, "%d,", buf[i]);
    }
    fprintf(&mystdout, "\n"); 
}

int main(int argc, char **argv)
{
    twi_init(2, &bus_a, &my_callback);

    // ...PMIC setup...

    // enable interrupts.
    sei();

    // (code that causes interrupt to fire)

    // spin while the rest of the application runs...
    while(1){
        _delay_ms(1000);
    }
    return 0;
}

I carefully trigger the events that cause the interrupt to fire and call the appropriate handler. Using some fprintfs I'm able to tell that the location assigned to twi->slave.slave_callback in the twi_init function is different than the one in the twi_slave_interrupt_handler function.

Though the numbers are meaningless, in twi_init the value is 0x13b, and in twi_slave_interrupt_handler when printed the value is 0x100.

By adding the commented workaround line in twi_driver.c:

twi->slave.slave_callback = twi->slave.slave_callback;

The problem goes away, but this is clearly a magic and undesirable solution. What am I doing wrong?

As far as I can tell, I've marked appropriate variables volatile, and I've tried marking other portions volatile and removing the volatile markings. I came up with the workaround when I noticed removing fprintf statements after the assignment in twi_init caused the value to be read differently later on.

The problem seems to be with how I'm passing around the function pointer -- and notably the portion of the program that is accessing the value of the pointer (the function itself?) is technically in a different thread.

Any ideas?

Edits:

  • resolved typos in code.

  • links to actual files: http://straymark.com/code/ [test.c|twi_driver.c|twi_driver.h]

  • fwiw: compiler options: -Wall -Os -fpack-struct -fshort-enums -funsigned-char -funsigned-bitfields -mmcu=atxmega128a1 -DF_CPU=2000000UL

  • I've tried the same code included directly (rather than via a library) and I've got the same issue.

Edits (round 2):

  • I removed all the optimizations, without my "workaround" the code works as expected. Adding back -Os causes an error. Why is -Os corrupting my code?
A: 

Replace "&my_callback" with "my_callback" in function main().

Because different threads access the callback address, try protecting it with a mutex or read-write lock.

If the callback function pointer isn't accessed by a signal handler, then the "volatile" qualifier is unnecessary.

Steve Emmerson
I've tried that, it makes no difference. The value is set (and unchanged) before the interrupts are enabled, so no access mutex read-write locks are required (access controls are by convention).
Mark E
caf
@caf, that's what I have. @Steve Emmerson: the callback function pointer is accessed by a signal handler, that's the point, it's not *modified* by that handler or any other after the intialization, though.
Mark E
Keep the "volatile", then. I still suspect misbehavior regarding sequence points. Try decreasing the level of optimization.
Steve Emmerson
I would also try changing the "-Os" optimization option to "-O0" to see if the problem is caused by optimization.
Steve Emmerson
+1  A: 

Just a hunch, but what happens if you switch these two lines around:

twi->slave.slave_callback = slave_callback;
twi->slave.length = 0;

Does removing the -fpack-struct gcc flag fix the problem? I wonder if you haven't stumbled upon a bug where writing that length field is overwriting part of the callback value.


It looks to me like with the -Os optimisations on (you could try combinations of the individual optimisations enabled by -Os to see exactly which one is causing it), the compiler isn't emitting the right code to manipulate the uint16_t length field when its not aligned on a 2-byte boundary. This happens when you include a twi_global_slave_t inside a twi_global_t that is packed, because the initial uint8_t member of twi_global_t causes the twi_global_slave_t struct to be placed at an odd address.

If you make that initial field of twi_global_t a uint16_t it will probably fix it (or you could turn off struct packing). Try the latest gcc build and see if it still happens - if it does, you should be able to create a minimal test case that shows the problem, so you can submit a bug report to the gcc project.

caf
seems like this is on the right track, I swapped the order and that fixes the problem. Why would the length field be overwriting the callback field?
Mark E
further complicating things, if I reorder the struct elements the problem also seems to go away.
Mark E
+1  A: 

This really sounds like a stack/memory corruption issue. If you run avr-size on your elf file, what do you get? Make sure (data + bss) < the RAM you have on the part. These types of issues are very difficult to track down. The fact that removing/moving unrelated code changes the behavior is a big red flag.

Rob Curtis