tags:

views:

461

answers:

2

I have a binary file of doubles that I need to load using C++. However, my problem is that it was written in big-endian format but the fstream >> operator will then read the number wrong because my machine is little-endian. It seems like a simple problem to resolve for integers, but for doubles and floats the solutions I have found won't work. How can I (or should I) fix this?

I read this as a reference for integer byte swapping:
http://stackoverflow.com/questions/105252/how-do-i-convert-between-big-endian-and-little-endian-values-in-c

EDIT: Though these answers are enlightening, I have found that my problem is with the file itself and not the format of the binary data. I believe my byte swapping does work, I was just getting confusing results. Thanks for your help!

+5  A: 

The most portable way is to serialize in textual format so that you don't have byte order issues. This is how operator>> works so you shouldn't be having any endian issues with >>. The principal problem with binary formats (which would explain endian problems) is that floating point numbers consist of a number of mantissa bits, a number of exponent bits and a sign bit. The exponent may use an offset. This mean that a straight byte re-ordering may not be sufficient, depending on the source and target format.

If you are using and IEEE-754 on both machines then you may be OK with a straight byte reversal as this standard specifies a bit-string interchange format that should be portable (byte order issues aside).

If you have to convert between two machine architectures and you have to use a raw byte memory dump, then so long as the basic number format is the same (i.e. they have the same bit counts in each part of the number), you can read the data into an array of unsigned char, use some basic byte and bit swapping routines to correct the storage format and then copy the raw bytes into a variable of the appropriate type.

Charles Bailey
You don't have to serialize in a textual format - you can convert a floating point number to three integers (sign, mantissa and exponent) using a value-oriented rather than representation-oriented method, and then serialize those integers to network byte order in the usual way.
caf
I'm curious: are there any even-not-mainstream chips that aren't IEEE-754?
Will
@caf: You can skip a step and convert them into two values: signed mantissa and exponent. But the standard way of (de)serializing in network byte order merits an answer.
Potatoswatter
@caf: I'm struggling to think of a common value oriented representation for integers that isn't subject to endian issues that isn't textual. Do you mean something like what utf-8 does for large unicode code points?
Charles Bailey
For integers it is sufficient to *define* the endianess (and size) - hence "network byte order". For "value-oriented" I was referring to the process of extracting the components of a floating point number (ie. using functions like `ilogb()` or equivalent rather than directly masking out parts of the underlying representation).
caf
A: 

The standard conversion operators do not work with binary data, so it's not clear how you got where you are.

However, since byte swapping operates on bytes, not numbers, you perform it on data destined to be floats just as data which will be integers.

And since text is so inefficient and floating-point data sets tend to be so large, it's entirely reasonable to want this.

int32_t raw_bytes;
stream >> raw_bytes; // not an int, just 32 bits of bytes
my_byte_swap( raw_bytes ); // swap 'em
float f = * reinterpret_cast< float * >( & raw_bytes ); // read them into FPU
Potatoswatter