views:

381

answers:

7

This is what I offered at an interview today.

int is_little_endian(void)
{
    union {
        long l;
        char c;
    } u;

    u.l = 1;

    return u.c == 1;
}

My interviewer insisted that c and l are not guaranteed to begin at the same address and therefore, the union should be changed to say char c[sizeof(long)] and the return value should be changed to u.c[0] == 1.

Is it correct that members of a union might not begin at the same address?

+8  A: 

I was unsure about the members of the union, but SO came to the rescue.

The check can be better written as:

int is_bigendian(void) {
    const int i = 1;
    return (*(unsigned char*)&i) == 0;
}

Incidentally, the C FAQ shows both methods: How can I determine whether a machine's byte order is big-endian or little-endian?

Sinan Ünür
I believe the hairy pointer casting is technically undefined behavior, but I couldn't cite anything, and it should certainly work on most machines.
Chris Lutz
I'd be surprised if it were undefined; otherwise how would memcpy and most serialization code work?
Crashworks
@Chris I believe you have it reversed. Converting from a `char *` to `int *` can cause undefined behavior. I have a copy of the WG14/N1124 draft and if things haven't changed since then: *When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte ofthe object.* (p.47, http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf)
Sinan Ünür
Okay. I don't have a copy (I'll get around to it one day) but I remembered hearing that the same trick from `float` to `int` in the Quake inverse square root function was undefined. I suppose converting between `char`s and `int`s is much more predictable, and thus defined.
Chris Lutz
@Chris clarification: Converting from a `char *` to `int *` would be undefined behavior if the two have different alignment requirements. But converting from any pointer type to `char *` is safe.
Sinan Ünür
@Chris: char is actually a special case in the standard, as a way of accessing the underlying representation of the other types.
caf
@CHris: "Hairy pointer casts", aka raw memory reinterpretation, are generally UB, *except* if you reinterpret it as an array of characters. The latter is explictly allowed in C. However, when `char` is used (as opposed to `unsigned char`) the set of things you can do with reinterpreted memory is limited. The above code is generally UB, since it is UB to read the value through such a `char *` pointer - the value might be a trap representation. The proper code should have used a cast to `unsigned char*`.
AndreyT
@caf: That would be `unsigned char`, not `char`.
AndreyT
+1  A: 

The standard says the offsets for each item in a union are implementation defined.

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values. ISO/IEC 9899:1999 Representation of Types 6.5.6.2, para 7 (pdf file)

Therefore it's up to the compiler to choose where to put the char relative to the long within the union- they are not guaranteed to have the same address.

fbrereto
There is one exception here. A little further down (6.7.2.1 para 13): "The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. _A pointer to a union object, suitably converted, points to each of its members_ (or if a member is a bit-field, then to the unit in which it resides), and vice versa." Basically, a start address of the union is guaranteed to be the same as the start address of each of its members.
D.Shawley
Good point, I'll cease meddling with fbrereton's question. I am confused now though, because if you're right, than the code in the question should work.
Dana the Sane
The OP's code is fine: See http://stackoverflow.com/questions/891471/union-element-alignment
Sinan Ünür
I'm pretty sure that it will work and is guaranteed to do so. See my answer... I was sorta surprised by this one.
D.Shawley
+3  A: 

While your code would probably work in many compilers the interviewer is right -- how to align fields in a union or struct is entirely up to the compiler and in this case the char could be placed either at the "beginning" or the "end". The interviewer's code leaves no room for doubt and is guaranteed to work.

Kristoffon
A: 

I have a question about this...

how is

u.c[0] == anything

valid given:

union {
    long l;
    char c;
} u;

How does [0] work on a char?

Seems to me, it would be equivalent to: (*u.c + 0) == anything, which would be, well, crap, considering the value of u.c, treated as a pointer, would be crap.

(Unless perhaps, as it occurs to me now, some html crap code ate an ampersand in the original question...)

smcameron
The interviewer said that `char c;` should be `char c[sizeof(long)];`, thus `u.c[0]` would be valid.
Chris Lutz
Ah, ok, that makes sense. Jesus inteviews suck.
smcameron
I would have done it:int x = 0x01020304;unsigned char *x = (char *) return x[0] == 0x01;
smcameron
And I would have been dinged for not using uint32_t, and the wrong cast. LOL. (Have had a beer or two since getting off work.)
smcameron
Not to mention two varibles called 'x'. Cripes.
smcameron
A: 

While the interviewer is correct and this is not guaranteed to work by the spec, none of the other answers are guaranteed to work either, as dereferencing a pointer after casting it to another type yields undefined behavior.

In practice, this (and the other answers) will always work, as all compilers allow casting between pointer-to-union and pointer-to-member-of-union transparently -- much ancient code will fail to work if they did not.

Chris Dodd
+6  A: 

You are correct in that the "members of a union might begin at the same address". The relevant part of the Standard is (6.7.2.1 para 13):

The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

Basically, a start address of the union is guaranteed to be the same as the start address of each of its members. I believe (still looking for the reference) that a long is guaranteed to be larger than a char. If you assume this, then your solution should_ be valid.

* I'm still a little uncertain due to some interesting wording around the representation of integer and, in particular, signed integer types. Take a close read of 6.2.6.2 clauses 1 & 2.

D.Shawley
A: 

correct me if I am wrong but local variables are not initialized to 0;

this is not better:

union {
    long l;
    char c;
} u={0,};
Arabcoder