views:

147

answers:

5

I'm using and x86 based core to manipulate a 32-bit memory mapped register. My hardware behaves correctly only if the CPU generates 32-bit wide reads and writes to this register. The register is aligned on a 32-bit address and is not addressable at byte granularity.

What can I do to guarantee that my C (or C99) compiler will only generate full 32-bit wide reads and writes in all cases?

For example, if I do a read-modify-write operation like this:

volatile uint32_t* p_reg = 0xCAFE0000;
*p_reg |= 0x01;

I don't want the compiler to get smart about the fact that only the bottom byte changes and generate 8-bit wide read/writes. Since the machine code is often more dense for 8-bit operations on x86, I'm afraid of unwanted optimizations. Disabling optimizations in general is not an option.

----- EDIT -------
An interesting and very relevant paper: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

A: 

Well, generally speaking I wouldn't expect it to optimize out the high order bytes if you have the register typed as a 32 bit volatile. Due to the use of the volatile keyword the compiler cannot assume that the values in the high order bytes are 0x00. Thus it must write the full 32bits even if you are only using a 8bit literal value. I've never experience a issue with this on the 0x86 or Ti processors, or other embedded processors. Generally the volatile keyword is enough. The only time things get a little weird is if the processor does not natively support the word size you're trying to write, but that shouldn't be an issue on the 0x86 for a 32 bit number.

While it would be possible for the compiler to generate a instruction stream that used 4 bit writes, that would not be an optimization in either processor time or instruction space over a single 32 bit write.

NoMoreZealots
The volatile qualifier does not prevent the compiler from narrowing access width from 32-bits to 8-bits. From it's point-of-view, the upper 24 volatile bits are untouched. Also, 8-bit instruction encoding results in fewer instruction bytes so -Os optimization has reason to prefer this.
srking
It prevents the compiler from assuming that the value is the same as what was read. It can't narrow access because it is required to write back all 32bits to garentee the value is what suppose to be.
NoMoreZealots
A: 

If you don't use byte (unsigned char) types when accessing the hardware, there will be a better chance of the compiler not generating 8-bit data transfer instructions.

volatile uint32_t* p_reg = 0xCAFE0000;
const uint32_t value = 0x01;  // This trick tells the compiler the constant is 32 bits.
*p_reg |= value;

You would have to read the port as a 32 bit value, modify the value, then write back:

uint32_t reg_value = *p_reg;
reg_value |= 0x01;
*p_reg = reg_value;
Thomas Matthews
Agree, but looking for something stronger than "better chance".
srking
+4  A: 

The ONLY way to GUARANTEE that the compiler will do the right thing is to write your load and store routines in assembler and call them from C. 100% of the compilers I have used over the years can and will get it wrong (GCC included).

Sometimes the optimizer gets you, for example you want to store some constant that appears to the compiler as a small number 0x10 lets say, into a 32 bit register, which is what you asked specifically and what I have watched otherwise good compilers try to do. Some compilers will decide that it is cheaper to do an 8 bit write instead of a 32 bit write and change the instruction. Variable instruction length targets are going to make this worse as the compiler is trying to save program space and not just memory cycles on what it may assume the bus to be. (xor ax,ax instead of mov eax,0 for example)

And with something that is constantly evolving like gcc, code that works today has no guarantees of working tomorrow (you cant even compile some versions of gcc with the current version of gcc). Likewise code that works on the compiler at your desk may not work universally for others.

Take the guessing and the experimenting out of it, and create load and store functions.

The side benefit to this is that you create a nice abstraction layer, if/when you want to simulate your code in some fashion or have the code run in application space instead of on the metal, or vice versa, the assembler functions can be replaced with a simulated target or replaced with code that crosses a network to a target with the device on it, etc.

dwelch
I've been writting hardware interfaces for 15 years, never had to write assembler in insure a 32bit write access. Volatile in practice tells the compiler it can make no assumptions about the prior value of a memory address between instructions.
NoMoreZealots
agreed, but I have had it fail. I was going to add to my comment if you make your load and store routines as functions in a separate .c file from the one you are calling them from, and dont inline them and dont let say llvm try to optimize the whole application, then you can avoid the assembler and have a good shot at it working reliably
dwelch
If we want to play that game I have been doing this over 20 years, many platforms many compilers. For the most part it works, but there are times that you get into a rut and you cannot figure out why the compiler is optimizing out or changing your code. It can work for weeks or months then add or change that nth line of code and it changes the way it compiles. The user said guarantee, if you dont want a "guaranteed to work" but instead "works most of the time", say more than 99% but less than 100% then the volatile (in a separate function in a separate file) will meet that requirement.
dwelch
+5  A: 

Your concerns are covered by the volatile qualifier.

6.7.3/6 "Type qualifiers" says:

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-defined.

5.1.2.3 "Program execution" says (among other things):

In the abstract machine, all expressions are evaluated as specified by the semantics.

This is followed by a sentence that is commonly referred to as the 'as-if' rule, which allows an implementation to not follow the abstract machine semantics if the end result is the same:

An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

But, 6.7.3/6 essentially says that volatile-qualified types used in an expression cannot have the 'as-if' rule applied - the actual abstract machine semantics must be followed. Therefore, if pointer to a volatile 32-bit type is dereferenced, then the full 32-bit value must be read or written (depending on the operation).

Michael Burr
A: 

Since a read-modify-write operation against hardware always is a huge risk to do in several instructions, most processors offer an instruction to manipulate a register/memory with one single instruction that can't be interrupted.

Depending on what type of register you are manipulating, it could change during your modify phase and then you would write back a false value.

I would recommend as dwelch suggest to write your own read-modify-write function in assembly if this is critical.

I have never heard of a compiler that optimizes a type (doing a type conversion with purpose to optimize). If it is declared as an int32 it is always a int32 and will always be aligned right in memory. Check your compiler documentation to see how the various optimizations work.

I think I know where your concern comes from, structures. Structures are usually padded to the optimal alignment. This is why you need to wrapp a #pragma pack() around them to get them byte aligned.

You can just single step through the assembly and then you will see how the compiler translated your code. I'm pretty sure it has not changed your type.

Max Kielland