views:

415

answers:

3

Hi there,

I used the following piece of code to read data from files as part of a larger program.

double data_read(FILE *stream,int code) {
        char data[8];
        switch(code) {
        case 0x08:
            return (unsigned char)fgetc(stream);
        case 0x09:
            return (signed char)fgetc(stream);
        case 0x0b:
            data[1] = fgetc(stream);
            data[0] = fgetc(stream);
            return *(short*)data;
        case 0x0c:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(int*)data;
        case 0x0d:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(float*)data;
        case 0x0e:
            for(int i=7;i>=0;i--)
                data[i] = fgetc(stream);
            return *(double*)data;
        }
        die("data read failed");
        return 1;
    }

Now I am told to use -O2 and I get following gcc warning: warning: dereferencing type-punned pointer will break strict-aliasing rules

Googleing I found two orthogonal answers:

vs

In the end I don't want to ignore the warnings. What would you reccomend?

[update] I substituted the toy example with the real function.

+4  A: 

It looks a lot as if you really want to use fread:

int data;
fread(&data, sizeof(data), 1, stream);

That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it is to use a union:

union
{
    char theChars[4];
    int theInt;
} myunion;

for(int i=0; i<4; i++)
    myunion.theChars[i] = fgetc(stream);
return myunion.theInt;

I'm not sure why the length of data in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.

Note that both your code and mine are highly non-portable.

Edit: If you want to read ints of various lengths from a file, portably, try something like this:

unsigned result=0;
for(int i=0; i<4; i++)
    result = (result << 8) | fgetc(stream);

(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)

This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.

If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.

Martin B
+1. To stress again: an union is an official way to keep the code strict aliasing compliant. This is not gcc specific, it's just gcc's optimizer is more broken in the respect. The warnings shouldn't be ignored: either disable explicitly -fstrict-aliasing optimization or fix the code.
Dummy00001
@Martin I fixed the '3-byte-int'. Would a union be portable?
Framester
@Framester: Depends on what you want to port to. Most desktop systems and kin mean the same thing by a 32-bit `int`, but some are big-endian and some are small-endian, meaning the order of the bytes in the `int` can vary.
David Thornley
@David: Just to pick a nit: The usual term is "little-endian".
Martin B
@Martin B: Thanks; don't know why I typed that.
David Thornley
A: 

Basically you can read gcc's message as guy you are looking for trouble, don't say I didn't warn ya.

Casting a three byte character array to an int is one of the worst things I have seen, ever. Normally your int has at least 4 bytes. So for the fourth (and maybe more if int is wider) you get random data. And then you cast all of this to a double.

Just do none of that. The aliasing problem that gcc warns about is innocent compared to what you are doing.

Jens Gustedt
Hi, I substituted the toy example with the real function. And the int with 3 bytes was just a typo from me.
Framester
A: 

Apparently the standard allows sizeof(char*) to be different from sizeof(int*) so gcc complains when you try a direct cast. void* is a little special in that everything can be converted back and forth to and from void*. In practice I don't know many architecture/compiler where a pointer is not always the same for all types but gcc is right to emit a warning even if it is annoying.

I think the safe way would be

int i, *p = &i;
char *q = (char*)&p[0];

or

char *q = (char*)(void*)p;

You can also try this and see what you get:

char *q = reinterpret_cast<char*>(p);
Sebastien Mirolo
`reinterpret_cast` is C++. This is C.
ptomato