views:

514

answers:

8

Supposing I have the following type from an external library:

union foreign_t {
    struct {
        enum enum_t an_enum;
        int an_int;
    } header;
    struct {
        double x, y;
    } point;
};

is it safe to assume the following code fragment will work as expected on different platforms and with different compilers?

struct pair_t {
    double x, y;
};

union foreign_t foreign;
struct pair_t *p_pair;

p_pair = (struct pair_t *) &foreign;
p_pair->x = 1234;
p_pair->y = 4321;

/* Expected result: (1234, 4321) or something like that */
printf("(%lf, %lf)", foreign.point.x, foreign.point.y);


EDIT:

Following the strict aliasing suggestion I made the following test:

#include <stdint.h>
#include <stdio.h>

int main()
{
    uint16_t word = 0xabcd;
    uint8_t tmp;
    struct {
        uint8_t low;
        uint8_t high;
    } *byte = (void *) &word;

    tmp = byte->low;
    byte->low = byte->high;
    byte->high = tmp;

    printf("%x\n", word);

    return 0;
}

The above apparently innocent piece of code is not reliable:

$ gcc -O3 -fno-strict-aliasing -otest test.c
$ ./test
cdab
$ gcc -O3 -fstrict-aliasing -otest test.c
$ ./test
abcd

No peace for developers...

+2  A: 

Yes, this will work normally. However, as soon as you write to foreign.header all bets are off as to the contents of foreign.point, even if a particular action works on a given compiler.

rlbond
What do you mean by "all bets are off"? It should work just fine as a `foreign.header`. He just can't write into `header` and then read from `point` and expect it to do the same thing on all platforms. But... he doesn't want to do that.
alex tingle
updated to reflect what I really meant? Thanks for pointing that out.
rlbond
+1  A: 

Yes that should be perfectly portable. The fact that foreign is a union never even comes into it, since you never use it as a header.

(You can't write to header and then read from point and expect it to work the same on all platforms. But then, you don't want to do that, so you should be fine.)

alex tingle
+1  A: 

Yes, this is entirely reasonable. The ANSI C standard indicates that you shouldn't write one "type" into a union and read out another and expect to get something reliable, expect under very specific circumstances. Here you want to write something into a union in one way and then read it out in the same way. You are guaranteed no padding at the start of the union, and appropriate pointer alignment, so as I understand it, you should be just fine with this.

Tim
While that's technically true, code everywhere relies on using unions to (safely, so the optimizer doesn't miss the alias) get the bits out of a float or double, etc... The fact that software relies on architecture specifics that aren't covered by the ANSI C99 standard does not make it "unreliable"
Andy Ross
A: 

Assuming double, int, and enum maintain consistency(I won't swear to it, but I believe double is an IEEE standard), it should work reliably. However, int changes depending on the system word, and enums insofar as I know can't be relied on for a specific size.

Paul Nathan
`double` precision floating point is standardized in IEEE754.
Crashworks
Paul Nathan
More precisely, C/C++'s implementation on any CPU you are likely to encounter uses the IEEE754 data types. I'm assuming he isn't planning to port this code to an antique Cray J90 supercomputer.
Crashworks
A: 

Yes, you are guaranteed that identical structures have the same size and alignment requirements.

Loadmaster
A: 

As written, yes it will work as you expect on any single platform.

A more typical definition of foreign would wrap the union in a struct that includes a type discriminator field so that the valid branch of the union is known explicitly at run time for each value of that type.

Where it gets interesting is when you wish to communicate the value of a foreign from platform A to platform B and get the expected data back out again. There you run into alignment, size, and byte order differences at minimum, and possibly even numeric representation differences as the standard doesn't actually require IEEE floats or 2's complement binary integers.

In practice, things are not that bad, but it becomes a point of concern for portability that is best mitigated by either platform-specific test cases and/or compile-time assertions if a binary interchange format is necessary.

Alternatively, marshaling the data from a platform-specific struct or union into a well define sequence of octets for storage and transmission is the robust answer. This is the approach taken by the MPEG standard, for instance.

RBerteig
+5  A: 

As you have written it, I believe it should work with virtually any compiler on any architecture. However, I do think it technically violates the strict aliasing rules. You're casting between unrelated pointer types, so an overly aggressive optimizer might reorder certain memory reads and writes because it assumes that certain pointers won't alias each other.

Unfortunately, I don't think there's a way to make this code absolutely bulletproof against strict aliasing, assuming that you can't modify the definition of foreign_t. Since the inner struct doesn't have a name, there's no way you can construct a pointer to it that the compiler will assume is aliasable. In practice, though, I don't think you'll see problems with your code.

Adam Rosenfield
I was thinking along these lines too, but I'm not confident enough about the strict aliasing rules to post an answer invoking them. I believe that because the unnamed struct inside the union and the `struct pair_t` are different types, the compiler can assume that a write through a `struct pair_t` object can't affect the value of a `union foreign_t` object.
caf
I was struggling with padding and alignment issues while the major problem is related to optimization. Thanks for the link.
ntd
A: 

is it safe to assume the following code fragment will work as expected on different platforms and with different compilers?

The plain answer is no...

For each machine and compiler you need to find out what command will align the struct correctly, this is one of the major portability problems

Arabcoder