ansaurus

Question

Writing and reading long int value in C code

Answer 1

+7 A:

Use the types in stdint.h to ensure you get the same number of bytes in and out.

Then you're just left with dealing with endianness issues, which you code probably doesn't really handle.

Serializing the long with an aliased char* leaves you with different byte orders in the written file for platforms with different endianess.

You should decompose the bytes something like so:

char c1 = (val >>  0) & 0xff;
char c2 = (val >>  8) & 0xff;
char c3 = (val >> 16) & 0xff;
char c4 = (val >> 24) & 0xff;

And recompose then using something like:

val = (c4 << 24) |
      (c3 << 16) |
      (c2 <<  8) |
      (c1 <<  0);

Michael Burr 2009-07-09 19:24:04

I think a union works much better.

GMan 2009-07-09 19:32:21

@GMan - don't you have the same problem with a union (unless you conditionally compile a different definition of the union based on the platform's endianess)?

Michael Burr 2009-07-09 19:57:43

The reference to stdint was very useful and it'll help a lot!

luiscubal 2009-07-09 21:29:22

Use unsigned chars or sign extension will bite you.

George Phillips 2009-07-09 21:48:42

@George - are you sure? However, now that you mention it, I think the recompose example will have a problem if sizeof( int) < sizeof( long). I'll fix that in a bit...

Michael Burr 2009-07-09 22:51:34

Answer 2

+1 A:

You might also run into issues with endianness. Why not just use something like NetCDF or HDF, which take care of any portability issues that may arise?

Pete 2009-07-09 19:24:32

Answer 3

+1 A:

Rather than using structures with characters in them, consider a more mathematical approach:

long l  = fgetc() << 24;
     l |= fgetc() << 16;
     l |= fgetc() <<  8;
     l |= fgetc() <<  0;

This is a little more direct and clear about what you are trying to accomplish. It can also be implemented in a loop to handle larger numbers.

Chris Arguin 2009-07-09 19:30:28

This reads the file in big-endian format. Which is maybe a good thing, but it would still be faster to read a whole `long` and then `bswap` it in memory.

ephemient 2009-07-09 19:34:04

@ephemient: assuming you need to bswap it ( what if you are big-endian?). Also assuming bswap works (what if your long is 64 bits?, or you are on some forsaken middle-endian machine? )

Chris Arguin 2009-07-09 19:42:26

Well, I was thinking "`bswap` if necessary", but that's obviously not what I wrote, and I try not to think about middle-endian machines (have they existed in the last two decades?) What about `s/bswap/ntohl/`? As far as I can tell, common implementations of it drop the high 32 bits if given a 64-bit value, which is the right thing to do.

ephemient 2009-07-09 19:50:54

Answer 4

+2 A:

Well you can use a union, for one:

union LongAsChars{
    long l;
    char c1, c2, c3, c4;
};

And it's more traditional to use an array, I think:

union LongAsChars{
    long l;
    char c[4];
};

Which makes your routine something like this (no compiler on-hand to test):

long readLong(FILE* file){

    LongAsChars lng;

    for (unsigned i = 0; i < 4; ++i)
    {
        lng.c[i] = fgetc(file);
        if (lng.c[i] < 0)
        {
            //throwError
        }


    }

    return lng.l;
}

void writeLong(long x, FILE* f){

    LongAsChars lng;
    lng.l = x;

    for (unsigned i = 0; i < 4; ++i)
    {
        fputc(lng.c[i], f);
    }
}

The only issues you'll get with standard types deal with endianness.

Also, unless I'm missing something, yes, just read and write the long value directly, no need to chop things up, which at best just makes things confusing:

long readLong(FILE* file){

    long x;

    fread(&x, sizeof(long), 1, file);    

    return x;
}

void writeLong(long x, FILE* f){

    fwrite(&x, sizeof(long), 1, file);    

    return x;
}

GMan 2009-07-09 19:31:58

The last (simple) code goes wrong if the file is written on a platform where sizeof(long) == 4, such as 64bit Windows, but read on a platform where sizeof(long) == 8, such as 64bit linux.

Steve Jessop 2009-07-09 21:20:03

Answer 5

A:

You don't want to use long int. That can be different sizes on different platforms, so is a non-starter for a platform-independent format. You have to decide what range of values needs to be stored in the file. 32 bits is probably easiest.

You say you aren't worried about other platforms yet. I'll take that to mean you want to retain the possibility of supporting them, in which case you should define the byte-order of your file format. x86 is little-endian, so you might think that's the best. But big-endian is the "standard" interchange order if anything is, since it's used in networking.

If you go for big-endian ("network byte order"):

// can't be bothered to support really crazy platforms: it is in
// any case difficult even to exchange files with 9-bit machines,
// so we'll cross that bridge if we come to it.
assert(CHAR_BIT == 8);
assert(sizeof(uint32_t) == 4);

{
    // write value
    uint32_t value = 23;
    const uint32_t networkOrderValue = htonl(value);
    fwrite(&networkOrderValue, sizeof(uint32_t), 1, file);
}

{
    // read value
    uint32_t networkOrderValue;
    fread(&networkOrderValue, sizeof(uint32_t), 1, file);
    uint32_t value = ntohl(networkOrderValue);
}

Actually, you don't even need to declare two variables, it's just a bit confusing to replace "value" with its network order equivalent in the same variable.

It works because "network byte order" is defined to be whatever arrangement of bits results in an interchangeable (big-endian) order in memory. No need to mess with unions because any stored object in C can be treated as a sequence of char. No need to special-case for endianness because that's what ntohl/htonl are for.

If this is too slow, you can start thinking about fiendishly optimised platform-specific byte-swapping, with SIMD or whatever. Or using little-endian, on the assumption that most of your platforms will be little-endian and so it's faster "on average" across them. In that case you'll need to write or find "host to little-endian" and "little-endian to host" functions, which of course on x86 just do nothing.

Steve Jessop 2009-07-09 21:15:47

Answer 6

A:

I believe the most cross architecture approach is to use the uintXX_t types, as defined in stdint.h. See man page here. For example a int32_t will give you a 32 bit integer on x86 and x86-64. I use these by default now in all of my code and have had no troubles, as they are fairly standard across all *NIX.

James 2009-07-10 02:28:43

Answer 7

A:

Assuming sizeof(uint32_t) == 4, there are 4!=24 possible byte orders, of which little-endian and big-endian are the most prominent examples, but others have been used as well (e.g. PDP-endian).

Here are functions for reading and writing 32 bit unsigned integers from a stream, heeding an arbitrary byte order which is specified by the integer whose representation is the byte sequence 0,1,2,3: endian.h, endian.c

The header defines these prototypes

_Bool read_uint32(uint32_t * value, FILE * file, uint32_t order);
_Bool write_uint32(uint32_t value, FILE * file, uint32_t order);

and these constants

LITTLE_ENDIAN
BIG_ENDIAN
PDP_ENDIAN
HOST_ORDER

Christoph 2009-07-10 15:23:38

ansaurus

tags:

views:

answers:

Writing and reading long int value in C code

related questions