views:

154

answers:

8

How can I serialize doubles and floats in C?

I have the following code for serializing shorts, ints, and chars.

unsigned char * serialize_char(unsigned char *buffer, char value)
{
    buffer[0] = value;
    return buffer + 1;
}

unsigned char * serialize_int(unsigned char *buffer, int value)
{
    buffer[0] = value >> 24;
    buffer[1] = value >> 16;
    buffer[2] = value >> 8;
    buffer[3] = value;
    return buffer + 4;
}

unsigned char * serialize_short(unsigned char *buffer, short value)
{
    buffer[0] = value >> 8;
    buffer[1] = value;
    return buffer + 2;
}

Edit:

I found these functions from this question

Edit 2:

The purpose of serializing is to send data to a UDP socket and guarantee that it can be deserialized on the other machine even if the endianness is different. Are there any other "best practices" to perform this functionality given that I have to serialize ints, doubles, floats, and char*?

+3  A: 

I remember first seeing the cast used in my example below in the good old Quake source code of the "rsqrt" routine, containing the coolest comment I'd seen at the time (Google it, you'll like it)

unsigned char * serialize_float(unsigned char *buffer, float value) 
{ 
    unsigned int ivalue = *((unsigned int*)&value); // warning assumes 32-bit "unsigned int"
    buffer[0] = ivalue >> 24;  
    buffer[1] = ivalue >> 16;  
    buffer[2] = ivalue >> 8;  
    buffer[3] = ivalue;  
    return buffer + 4; 
} 

I hope I've understood your question (and example code) correctly. Let me know if this was usefull?

S.C. Madsen
I would add `char assumes_sz_float_eq_sz_int[(2*(int)(sizeof(int)==sizeof(float)))-1];` at the top of the function.
David X
@David X As a compile-time check? Good idea, I usually do that trick with enums, but i guess a negative array length works equally well
S.C. Madsen
@SCMadden, that would be `enum{foo=0,bar=(condition)};`, right? That actually might be better, although you get an extra namespace pollutant with `foo`. (Also, the cast to `int` above is useless, i think i though the comparison was returning a `size_t` for some reason.)
David X
@David X I use enum{bar=1/(condition)}; And if its placed inside the .c/.cpp file I don't think it pollutes any namespaces. I thought sizeof returned size_t too...
S.C. Madsen
@SCMadsen, yeah, that looks like the best option. `sizeof` does return `size_t`, but for some reason i thought the operator `==` on two `size_t`s would return a `size_t`, but it returns a `int`, so the cast to int is pointless.
David X
+3  A: 

Following your update, you mention the data is to be transmitted using UDP and ask for best practices. I would highly recommend sending the data as text, perhaps even with som markup added (XML). Debugging endian-releated errors across a transmission-line is a waste of everybodys time

Just my 2 cents on the "best practices" part of your question

S.C. Madsen
Although sending plain-text would be nice, one of the requirements is to use as little bandwidth as possible.
Trevor
@Trevor Sending text does not necessarily mean much extra bandwidth. For example, sending the integer 1 takes 4 bytes (on your platform) when sent as an int and 2 (assuming a separator) when sent as text. so this tends to even out. and text is far, far simpler to handle and debug.
anon
Okidoki, then use the example-code I showed in my earlier answer, and let me know if it works for ya
S.C. Madsen
In the end, using plain text with a delimeter was the easiest way to go. It's only a few extra bytes per message compared with serializing floats and doubles into the message. Thanks.
Trevor
A: 

You can always use unions to serialize:

void serialize_double (unsigned char* buffer, double x) {
    int i;
    union {
        double         d;
        unsigned char  bytes[sizeof(double)];
    } u;

    u.d = x;
    for (i=0; i<sizeof(double); ++i)
        buffer[i] = u.bytes[i];
}

This isn't really any more robust than simply casting the address of the double to a char*, but at least by using sizeof() throughout the code you are avoiding problems when a data type takes up more/less bytes than you thought it did (this doesn't help if you are moving data between platforms that use different sizes for double).

For floats, simply replace all instances of double with float. You may be able to build a crafty macro to auto-generate a series of these functions, one for each data type you are interested in.

bta
+1  A: 

For the narrow question about float, note that you probably end up assuming that both ends of the wire are using the same representation for floating point. This might be safe today given the pervasive use of IEEE-754, but note that some current DSPs (I believe blackfins) use a different representation. In the olden days, there were at least as many representations for floating point as there were manufactures of hardware and libraries so this was a bigger issue.

Even with the same representation, it might not be stored with the same byte order. That will necessitate deciding on a byte order on the wire, and tweaked code at each end. Either the type-punned pointer cast or the union will work in practice. Both are invoking Implementation Defined behavior, but as long as you check and test that is not a big deal.

That said, text is often your friend for transferring floating point between platforms. The trick is to not use too many more characters that are really needed to convert it back.

All in all, I'd recommend giving some serious consideration to using a library such as XDR that is robust, been around for a while, and has been rubbed up against all of the sharp corner and edge cases.

If you insist on rolling your own, take care about subtle issues like whether int is 16 bits, 32 bits, or even 64 bits in addition to representation of float and double.

RBerteig
AFAIK the blackfin has no FPU.
S.C. Madsen
@S.C, Maybe I'm remembering a TI DSP then. All of IEEE-754 would still be more costly than you'd like to implement in a DSP, after all.
RBerteig
A: 

To start, you should never assume that short, int etc have the same width on both sides. It would be much better to use the uint32_t etc (unsigned) types that have known width on both sides.

Then to be sure that you don't have problems with endianess there are the macros/functions ntoh htos etc that are usually much more efficient than anything you can do by your own. (on intel hardware they are e.g just one assembler instruction.) So you don't have to write conversion functions, basically they are already there, just cast your buffer pointer to a pointer of the correct integer type.

For float you may probably assume that they are 32 bit and have the same representation on both sides. So I think a good strategy would be to use a pointer cast to uint32_t* and then the same strategy as above.

If you think you might have different representations of float you would have to split into mantissa and exponent. Probably you could use frexpf for that.

Jens Gustedt
A: 

This packs a floating point value into an int and long long pair, which you can then serialise with your other functions. The unpack() function is used to deserialise.

The pair of numbers represent the exponent and fractional part of the number respectively.

#define FRAC_MAX 9223372036854775807LL /* 2**63 - 1 */

struct dbl_packed
{
    int exp;
    long long frac;
};

void pack(double x, struct dbl_packed *r)
{
    double xf = fabs(frexp(x, &r->exp)) - 0.5;

    if (xf < 0.0)
    {
        r->frac = 0;
        return;
    }

    r->frac = 1 + (long long)(xf * 2.0 * (FRAC_MAX - 1));

    if (x < 0.0)
        r->frac = -r->frac;
}

double unpack(const struct dbl_packed *p)
{
    double xf, x;

    if (p->frac == 0)
        return 0.0;

    xf = ((double)(llabs(p->frac) - 1) / (FRAC_MAX - 1)) / 2.0;

    x = ldexp(xf + 0.5, p->exp);

    if (p->frac < 0)
        x = -x;

    return x;
}
caf
+1  A: 

The portable way: use frexp to serialize (convert to integer mantissa and exponent) and ldexp to deserialize.

The simple way: assume in 2010 any machine you care about uses IEEE float, declare a union with a float element and a uint32_t element, and use your integer serialization code to serialize the float.

The binary-file-haters way: serialize everything as text, floats included. Use the "%a" printf format specifier to get a hex float, which is always expressed exactly (provided you don't limit the precision with something like "%.4a") and not subject to rounding errors. You can read these back with strtod or any of the scanf family of functions.

R..
`%a` isn't in C89, but is in C99. Notably, C99 also handles NaN and infinities better by specifying how `printf` formats them and `scanf` reads them.
RBerteig
Good point. If you need C89 compatibility, just write your own `printf("%a", f)` code. It only takes about 20 lines if you don't need support for non-finite arguments, and 10-15 more if you do. Unlike printing floating point numbers in decimal, printing them in hex is very easy and the trivial implementation does what you expect (i.e. it actually works).
R..
A: 

You can portably serialize in IEEE-754 regardless of the native representation:

int fwriteieee754(double x, FILE * fp, int bigendian)
{
    int                     shift;
    unsigned long           sign, exp, hibits, hilong, lowlong;
    double                  fnorm, significand;
    int                     expbits = 11;
    int                     significandbits = 52;

    /* zero (can't handle signed zero) */
    if(x == 0) {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if(x > DBL_MAX) {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if(x < -DBL_MAX) {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test
     * isnan() is C99, POSIX.1 only, use it if you will.
     */
    if(x != x) {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if(x < 0) {
        sign = 1;
        fnorm = -x;
    } else {
        sign = 0;
        fnorm = x;
    }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while(fnorm >= 2.0) {
        fnorm /= 2.0;
        shift++;
    }
    while(fnorm < 1.0) {
        fnorm *= 2.0;
        shift--;
    }

    /* check for denormalized numbers */
    if(shift < -1022) {
        while(shift < -1022) {
            fnorm /= 2.0;
            shift++;
        }
        shift = -1023;
    } else {
        /* take the significant bit off mantissa */
        fnorm = fnorm - 1.0;
    }
    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);

    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1);   /* shift + bias */

    /* put the data into two longs */
    hibits = (long)(significand / 4294967296);  /* 0x100000000 */
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

 writedata:
    /* write the bytes out to the stream */
    if(bigendian) {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    } else {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}

In machines using IEEE-754 (ie. the common case), all you'll need to do to get the number is an fread(). Otherwise, decode the bytes yourself (sign * 2^(exponent-127) * 1.mantissa).

Note: when serializing in systems where the native double is more precise than the IEEE double, you might encounter off-by-one errors in the low bit.

Hope this helps.

Michael Foukarakis