views:

759

answers:

10

How to convert a floating point number into a sequence of bytes so that it can be persisted in a file? Such algorithm must be fast and highly portable. It must allow also the opposite operation, deserialization. It would be nice if only very tiny excess of bits per value (persistent space) is required.

+4  A: 

sprintf, fprintf ? you don't get any more portable than that.

boytheo
it is not effective solution, it requires much more persistent space than for the same numbers represented in RAM
psihodelia
What's not effective about it; there are potentially serious complications with trying to save the floating point numbers directly, doing it as strings is pretty much standard operating procedure.
Carl Norum
I would prefer smth. like an excess bit indicating an endiannes, an excess bit-or-two indicating a number of bytes per floating number. maybe some excess bits to indicate a mantisse/exponent type (e.g. IEEE754-2008)
psihodelia
Well, why don't you just do that then?
Steve Jessop
It may require more space, but it's both human readable and machine readable, endian-agnostic, and theoretically limitless with regards to the precision required.
dreamlax
More importantly @dreamlax, it is Floating Point Format agnostic.
sixlettervariables
+2  A: 

Take a look at Google Protocol Buffers. Granted it's C++, but you can get some ideas from there. The code is BSD licensed, so you can do whatever with it :)

Nikolai N Fetissov
I need C, not C++.I am interested in an algorithm, not in a software library.
psihodelia
That's what I said - take a look at the float encoding and use it if it fits your needs.
Nikolai N Fetissov
A: 

What level of portability do you require? If the file is to be read on a computer with the same OS that it was generated on, than you using a binary file and just saving and restoring the bit pattern should work. Otherwise as boytheo said, ASCII is your friend.

David Harris
ASCII is very ineffective
psihodelia
+2  A: 

Converting to an ascii representation would be the simplest, but if you need to deal with a colossal number of floats, then of course you should go binary. But this can be a tricky issue if you care about portability. Floating point numbers are represented differently in different machines.

If you don't want to use a canned library, then your float-binary serializer/deserializer will simply have to have "a contract" on where each bit lands and what it represents.

Here's a fun website to help with that: link.

Angelo
+5  A: 

Assuming you're using mainstream compilers, floating point values in C and C++ obey the IEEE standard and when written in binary form to a file can be recovered in any other platform, provided that you write and read using the same byte endianess. So my suggestion is: pick an endianess of choice, and before writing or after reading, check if that endianess is the same as in the current platform; if not, just swap the bytes.

Fabio Ceconello
according to the C99 spec, annex F, conforming implementations should define `__STDC_IEC_559__`, which in principle could be used as a compile-time check, but is useless in practice as there are issues with gcc ( http://gcc.gnu.org/c99status.html , scroll down to 'Further Issues')
Christoph
Compiler's don't necessarily dictate the IEEE floating point format. There are still computers which use other formats unfortunately (VAX/Alpha, IBM). But +1 ensuring you have the endianness right.
sixlettervariables
Right, but they have to know the format used by the platform to support it in the RTL. Also, many platforms (these days especially embedded) don't have a math coprocessor, so they do dictate the format in the accompanying emulation lib. So I thought it'd be easier to refer to the compiler.
Fabio Ceconello
Isn't the case to treat those platforms that don't support the IEEE standard as exceptions, and when the (rare) version for them is needed, just do the necessary conversions only there? Here's a good article about the differences: http://www.codeproject.com/KB/applications/libnumber.aspx
Fabio Ceconello
+1  A: 

What do you mean, "portable"?

For portability, remember to keep the numbers within the limits defined in the Standard: use a single number outside these limits, and there goes all portability down the drain.

double planck_time = 5.39124E-44; /* second */


5.2.4.2.2 Characteristics of floating types <float.h>

[...]
10   The values given in the following list shall be replaced by constant
     expressions with implementation-defined values [...]
11   The values given in the following list shall be replaced by constant
     expressions with implementation-defined values [...]
12   The values given in the following list shall be replaced by constant
     expressions with implementation-defined (positive) values [...]
[...]

Note the implementation-defined in all these clauses.

pmg
+2  A: 

You could always convert to IEEE-754 format in a fixed byte order (either little endian or big endian). For most machines, that would require either nothing at all or a simple byte swap to serialize and deserialize. A machine that doesn't support IEEE-754 natively will need a converter written, but doing that with ldexp and frexp (stanard C library functions)and bit shuffling is not too tough.

Chris Dodd
The problem comes with FP standards that lack some of the "features" of IEEE. Namely the VAX and IBM floating point formats...You're in for a world of hurt w.r.t. corner cases. Thankfully, people have written excellent converters which handle these cases gracefully (I'm looking at you USGS! I owe you a beer).
sixlettervariables
An ANSI compliant frexp function hides most of that for you. Of course, you may end up with cases where serialization and deserialization gives you a (close but) different value.
Chris Dodd
A: 

This might give you a good start - it packs a floating point value into an int and long long pair, which you can then serialise in the usual way.

#define FRAC_MAX 9223372036854775807LL /* 2**63 - 1 */

struct dbl_packed
{
    int exp;
    long long frac;
};

void pack(double x, struct dbl_packed *r)
{
    double xf = fabs(frexp(x, &r->exp)) - 0.5;

    if (xf < 0.0)
    {
        r->frac = 0;
        return;
    }

    r->frac = 1 + (long long)(xf * 2.0 * (FRAC_MAX - 1));

    if (x < 0.0)
        r->frac = -r->frac;
}

double unpack(const struct dbl_packed *p)
{
    double xf, x;

    if (p->frac == 0)
        return 0.0;

    xf = ((double)(llabs(p->frac) - 1) / (FRAC_MAX - 1)) / 2.0;

    x = ldexp(xf + 0.5, p->exp);

    if (p->frac < 0)
        x = -x;

    return x;
}
caf
A: 

This version has excess of only one byte per one floating point value to indicate the endianness. But I think, it is still not very portable however.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

#define LITEND      'L'
#define BIGEND      'B'

typedef short               INT16;
typedef int                 INT32;
typedef double              vec1_t;

 typedef struct {
    FILE            *fp;
} WFILE, RFILE;

#define w_byte(c, p)    putc((c), (p)->fp)
#define r_byte(p)       getc((p)->fp)

static void w_vec1(vec1_t v1_Val, WFILE *p)
{
    INT32   i;
    char    *pc_Val;

    pc_Val = (char *)&v1_Val;

    w_byte(LITEND, p);
    for (i = 0; i<sizeof(vec1_t); i++)
    {
        w_byte(pc_Val[i], p);
    }
}


static vec1_t r_vec1(RFILE *p)
{
    INT32   i;
    vec1_t  v1_Val;
    char    c_Type,
            *pc_Val;

    pc_Val = (char *)&v1_Val;

    c_Type = r_byte(p);
    if (c_Type==LITEND)
    {
        for (i = 0; i<sizeof(vec1_t); i++)
        {
            pc_Val[i] = r_byte(p);
        }
    }
    return v1_Val;
}

int main(void)
{
    WFILE   x_FileW,
            *px_FileW = &x_FileW;
    RFILE   x_FileR,
            *px_FileR = &x_FileR;

    vec1_t  v1_Val;
    INT32   l_Val;
    char    *pc_Val = (char *)&v1_Val;
    INT32   i;

    px_FileW->fp = fopen("test.bin", "w");
    v1_Val = 1234567890.0987654321;
    printf("v1_Val before write = %.20f \n", v1_Val);
    w_vec1(v1_Val, px_FileW);
    fclose(px_FileW->fp);

    px_FileR->fp = fopen("test.bin", "r");
    v1_Val = r_vec1(px_FileR);
    printf("v1_Val after read = %.20f \n", v1_Val);
    fclose(px_FileR->fp);
    return 0;
}
psihodelia
It is portable only to machines sharing the same floating point format. Having been down this road, I will give you the following advice: **Standardize on Little Endian IEEE-754 and make everybody else convert to/from that if necessary**. You will be MUCH happier in the end. You will have portability through a rigid standard.
sixlettervariables
A: 

fwrite(), fread()? You will likely want binary, and you cannot pack the bytes any tighter unless you want to sacrifice precision which you would do in the program and then fwrite() fread() anyway; float a; double b; a=(float)b; fwrite(&a,1,sizeof(a),fp);

If you are carrying different floating point formats around they may not convert in a straight binary sense, so you may have to pick apart the bits and perform the math, this to the power that plus this, etc. IEEE 754 is a dreadful standard to use but widespread so it would minimize the effort.

dwelch