



I'm working on an embedded project and I'm trying add more structure to some of the code, which use macros to optimize access to registers for USARTs. I'd like to organize preprocessor #define'd register addresses into const structures. If I define the structs as compound literals in a macro and pass them to inline'd functions, gcc has been smart enough the bypass the pointer in the generated assembly and hardcode the structure member values directly in the code. E.g.:


struct uart {
   volatile uint8_t * ucsra, * ucsrb, *ucsrc, * udr;
   volitile uint16_t * ubrr;

#define M_UARTX(X)                  \
    ( (struct uart) {               \
        .ucsra = &UCSR##X##A,       \
        .ucsrb = &UCSR##X##B,       \
        .ucsrc = &UCSR##X##C,       \
        .ubrr  = &UBRR##X,          \
        .udr   = &UDR##X,           \
    } )

void inlined_func(const struct uart * p, other_args...) {
    (*p->ucsra) = 0;
    (*p->ucsrb) = 0;
    (*p->ucsrc) = 0;
int main(){
     inlined_func(&M_UART(0), other_parms...);

Here UCSR0A, UCSR0B, &c, are defined as the uart registers as l-values, like

#define UCSR0A (*(uint8_t*)0xFFFF)

gcc was able to eliminate the structure literal entirely, and all assignments like that shown in inlined_func() write directly into the register address, w/o having to read the register's address into a machine register, and w/o indirect addressing:


movb $0, UCSR0A
movb $0, UCSR0B
movb $0, UCSR0C

This writes the values directly into the USART registers, w/o having to load the addresses into a machine register, and so never needs to generate the struct literal into the object file at all. The struct literal becomes a compile-time structure, with no cost in the generated code for the abstraction.

I wanted to get rid of the use of the macro, and tried using a static constant struct defined in the header:


#define M_UART0 M_UARTX(0)
#define M_UART1 M_UARTX(1)

static const struct uart * const uart[2] = { &M_UART0, &M_UART1 };
int main(){
     inlined_func(uart[0], other_parms...);

However, gcc cannot remove the struct entirely here:


movl __compound_literal.0, %eax
movb $0, (%eax)
movl __compound_literal.0+4, %eax
movb $0, (%eax)
movl __compound_literal.0+8, %eax
movb $0, (%eax)

This loads the register addresses into a machine register, and uses indirect addressing to write to the register. Does anyone know anyway I can convince gcc to generate A1 assembly code for C2 C code? I've tried various uses of the __restrict modifier, with no avail.

+1  A: 

After many years of experience with UARTs and USARTs, I have come to these conclusions:

Don't use a struct for a 1:1 mapping with UART registers.

Compilers can add padding between struct members without your knowledge, thus messing up the 1:1 correspondence.

Writing to UART registers is best done directly or through a function.

Remember to use volatile modifier when defining pointers to the registers.

Very little performance gain with Assembly language

Assembly language should only be used if the UART is accessed through processor ports rather than memory-mapped. The C language has no support for ports. Accessing UART registers through pointers is very efficient (generate an assembly language listing and verify). Sometimes, it may take more time to code in assembly and verify.

Isolate UART functionality into a separate library

This is a good candidate. Besides, once the code has been tested, let it be. Libraries don't have to be (re)compiled all the time.

Thomas Matthews
The compiler can add all the padding it wants; the members of the structs are *pointers* to the registers. The struct is holding the register pointers, and is *not* being mapped to the uart registers.And I'm not writing assembly; I'm just showing what gcc generates.
Tim Schaeffer
+1  A: 

Using structs "across compile domains" is a cardinal sin in my book. Basically using a struct to point at something, anything, file data, memory, etc. And the reason is that it will fail, it is not reliable, no matter the compiler. There are many compiler specific flags and pragmas for this, the better solution is to just not do it. You want to point at address plus 8, point at address plus 8, use a pointer or an array. In this specific case I have had way too many compilers fail to do that as well and I write assembler PUT32/GET32 PUT16/GET16 functions to guarantee that the compiler doesnt mess with my register accesses, like structs, you will get burned one day and have a hell of a time figuring out why your 32 bit register only had 8 bits written to it. The overhead of the jump to the function is worth the peace of mind and the reliability and portability of the code. Also this makes your code extremely portable, you can put wrappers in for the put and get functions to cross networks, run your hardware in an hdl simulator and reach into the simulation to read/write registers, etc, with a single chunk of code that doesnt change from simulation to embedded to os device driver to application layer function.

Thomas' comment to the other answer applies here as well - you misread his code...
eh.. that should read Tim's comment to Thomas' answer ;)