views:

178

answers:

2

I'm having trouble getting the following code to work correctly. Using an online IEEE-754 converter, I wrote out (by hand) to the testData.txt file that is read with the bit string that should signify the floating point number 75.5; the actual cout.write does show that the bit string is as I expect as well. However, when I try to coerce the char* into a float using a union (as I have seen is a typical way to accomplish this conversion) the resulting float is not the number I expect.

#include<climits>
#include<iostream>
#include<fstream>
#include<bitset>

int main( int, char** )
{

    std::ifstream inputFile( "testData.txt", std::ios_base::in | std::ios_base::binary );
    if( !inputFile ) std::cout << "Failed to open input file!" << std::endl;

    char buffer[ CHAR_BIT * sizeof(float) ];
    inputFile.read( buffer, CHAR_BIT * sizeof(float) );

    std::cout << "cout.write of input from file = ";
    std::cout.write( buffer, CHAR_BIT * sizeof(float) );
    std::cout << std::endl;

    union { float f; char* c; } fToCharStarUnion;

    fToCharStarUnion.c = buffer;
    std::bitset< sizeof(float) * CHAR_BIT > bits( std::string( fToCharStarUnion.c ) );
    std::cout << "fToCharStarUnion.f = " << fToCharStarUnion.f << " bits = " << bits << std::endl;

    inputFile.close();
    return 0;
}

The return result of running this is:

cout.write of input from file = 01000010100101110000000000000000
fToCharStarUnion.f = -1.61821e+38 bits = 01000010100101110000000000000000

Is there something fundamental I am not doing which will make this work correctly?

+4  A: 

Your union needs to include an array of char rather than a pointer.

union { float f; char c[sizeof(float)]; } float2char;

You will also then have to worry about endianness; is c[0] the exponent end of the float, or the tail of the mantissa. (The answer will vary depending on your hardware - Intel vs PPC or SPARC or ...)

Jonathan Leffler
Changed the union as above, used a memcpy call to put the buffer on the union's float variable but the answer is still not right. I am using an Intel. I see you didn't multiply the sizeof(float) by CHAR_BIT ... is it possible my problem has to do with my file being written out by hand in an ascii? I thought that was taken care of by the ifstream ios_base::binary flag though.
bpw1621
I only need as many characters as there are bytes in the float - hence sizeof(float) is correct. You'd be allocated a 32-byte value on most machines if you multiplied by CHAR_BIT, but most float types are just 4 bytes long.
Jonathan Leffler
I understand what you're saying (and it makes sense) but if that is the case then I think something is wrong with the way I am "reading" the data. If I read in only sizeof(float) characters I only get one word worth i.e., 0100 instead of 01000010100101110000000000000000
bpw1621
@bpw1612: have you read my answer yet?
Potatoswatter
@bpw1612: you have various levels of confusion...your data consists of a string of '1' and '0' characters, representing 1 bit of information per byte of input string. You have to compress that 32-byte string down to 32 consecutive bits. You can then think about extracting the data as a float. But while you have 8 times as much data as necessary, you are going to run into problems, and no union in the world is going to help much. As I said previously, I think @Potatoswatter is providing a good solution. My basic point (that you do not want a char pointer in the union) remains valid, but...
Jonathan Leffler
@bpw1612: well, I thought I'd submitted a comment that maybe @Potatoswatter's solution was more directly on target than mine. I can't now see it, so maybe I didn't hit 'add' after all.
Jonathan Leffler
@all: yeah, I think the best advice for @bpw now is just to avoid char*'s altogether. There is rarely a reason to use them in "pure" C++.
Potatoswatter
@all: you are right, I am confused on many levels because I usually do not have to write this type of code. I actually _will_ be dealing with binary data but I was trying to write a test program to see how the iostream calls behaved. In the "real" thing I will be receiving a UDP buffer that I will need to read into, determine the object type from a field in the packet, deserialize into an appropriate object. I thought this was a toy example of that but I guess I just obfuscated what I was trying to accomplish. Thanks for the help.
bpw1621
@bpw: are you sure you will be reading a string of '1's and '0's? More likely you will have raw binary data and you need to read up on `pragma pack()` and the like, to generate deterministic `struct` layouts.
Potatoswatter
@Potatoswatter: absolutely correct. This was just a test program to determine how the stream calls work because, upon receipt of a packet, I will have to read ahead to extract an int and from that int I can determine what object the pack gets deserialized into. If they were all packets "of the same type" I could just jam them into a fixed object but each packet can represent any of a number of objects.
bpw1621
+3  A: 

You are translating the ASCII into bits using the constructor of bitset. That causes your decoded bits to be in the bitset object rather than the union. To get raw bits out of a bitset, use the to_ulong method:

#include<climits>
#include<iostream>
#include<fstream>
#include<bitset>

int main( int, char** )
{

    std::ifstream inputFile( "testData.txt",
       std::ios_base::in | std::ios_base::binary );
    if( !inputFile ) std::cout << "Failed to open input file!" << std::endl;

    char buffer[ CHAR_BIT * sizeof(float) ];
    inputFile.read( buffer, CHAR_BIT * sizeof(float) );

    std::cout << "cout.write of input from file = ";
    std::cout.write( buffer, CHAR_BIT * sizeof(float) );
    std::cout << std::endl;

    union {
        float f[ sizeof(unsigned long)/sizeof(float) ];
        unsigned long l;
    } funion;

    funion.l = std::bitset<32>( std::string( buffer ) ).to_ulong();
    std::cout << "funion.f = " << funion.f[0]
       << " bits = " << std::hex <<funion.l << std::endl;

    inputFile.close();
    return 0;
}

This generally assumes that your FPU operates with the same endianness as the integer part of your CPU, and that sizeof(long) >= sizeof(float)… less guaranteed for double, and indeed the trick is harder to make portable for 32-bit machines with 64-bit FPUs.

Edit: now that I've made the members of the union equal sized, I see that this code is sensitive to endianness. The decoded float will be in the last element of the array on a big-endian machine, first element on little-endian. :v( . Maybe the best approach would be to attempt to give the integer member of the union exactly as many bits as the FP member, and perform a narrowing cast after getting to_ulong. Very difficult to maintain the standard of portability you seemed to be shooting for in the original code.

Potatoswatter
I have read you're answer, it does completely fix my problem so thank you very much. I'll mark this as answered but I wanted to see whether Jonathan Leffler was presenting something different before I did. I'd also rather use a solution that had a character array vice a unsigned long int as the other member of the union (because it's not clear how to extend union to any other type than float to me at least but in the char[] case it is trivial).
bpw1621
@bpw: If you use a `char[]` to store a bit-array, each `char` object stores 8 bits. If you use a `long[]` to store a bit-array, each `long` object stores sizeof(long)*8 bits. Unfortunately, `bitset` will only return one `long`'s worth of bits to you, so you are essentially restricted to a `long[1]`. It's really a limit of `bitset` and you would need to code your own to get around it.
Potatoswatter
@bpw (from other thread): You can jam a single packet into a `union`, at least. All members of a union are guaranteed to have the same address. A series of packets can be a sequence of union pointers into a buffer. Use that initial `int` to determine what kind of object each union really is… this behavior is defined and guaranteed safe by C++ §9.5/1. However, don't forget to perform endian conversion on every member.
Potatoswatter
@Potatoswatter: That actually helps a lot.@Jonathan Leffler: Thanks for your help, your comments made it clear I was conflating a couple different problems.
bpw1621