ansaurus

Question

transferring binary files between systems

Answer 1

+2 A:

Method 1 should work. Just create a test vector with values 1, 2, ..., 10 and send it across. You cam read the ascii that was created (so you can validate 'export') and therefore also check the 'import' step of re-reading the file. You may loose precision this way, but it should get you operational.

Method 2 will work once you use a library such as XDR that deals with the different endianness. These things used to be bigger problem 'way back when' and there are solutions. This is e.g. how system like R permit you so share binary files between architectures.

Method 3 is not needed unless you do something really awkward when transferring the file.

Dirk Eddelbuettel 2010-06-15 16:26:16

Thanks, but seems the binaries are dependent to the architectures, I'm not a computer scientists, but I remember from my undergrad that there are different ways to save the sign of a floating point number. Does XDR work now? Where can I find it?

tim 2010-06-15 16:52:49

XDR is a an encoding library by Sun, try googleing for 'xdr library' which just got me a number of hits. Ascii export (method 1) will not need it. Export a set of known numbers to ascii, check the file. Import from that file, check again. When you have that working, use it on your real data.

Dirk Eddelbuettel 2010-06-15 17:03:52

Answer 2

A:

Solutions 2 and 3 will generally not work, since different processors might use different internal representations of your numbers. For integers, not float/doubles, you could get away with something that just takes care of the byte order of your different machines. Floating point representations are much more tricky, and you would have to look up in detail what representations your different architectures use. But still then for double, e.g, there is only a minimal requirement about the precision, and you might find yourself in a situation where you'd have to truncate to the smaller representation of the two. These problems have not much to do with the OS you are using (Unix or not) but with how the hardware likes to have things.

Jens Gustedt 2010-06-15 16:42:18

Thank you, so there is no other way? Do you have any idea of why solution 1 doe not work?Thank you any way

tim 2010-06-15 16:46:11

Solution 1 should usually work but is relatively expensive (time, bandwidth). Why your particular implementation of that didn't do it, we can't know, you didn't give us the details.

Jens Gustedt 2010-06-18 10:04:03

Answer 3

+2 A:

Provided details are scarce. Answering to best of my understanding.

.. one of the systems is IBM ppc997 and the other is AMD Opteron

Former system generally (*) uses big-endian presentation, later - little-endian. Read this.

(*) It depends on the OS. IBM's POWER CPU can do both little and big endian, but no OS actually running on them uses little-endian mode.

Normally, for binary presentation one picks one endianness and goes with it for binary presentation. For network stuff big-endian number presentation is a norm.

That means all places which do something like this:

/* writing to binary */
int a = 1234;
write(fd,&a,sizeof(a));
/* reading from binary */
int x;
read(fd,&x,sizeof(x));

should be converted to something like this:

/* writing to binary */
int a = htonl(1234);
write(fd,&a,sizeof(a));
/* reading from binary */
int x;
read(fd,&x,sizeof(x));
x = ntohl(x);

Another approach is to save endianness indicator (e.g. write magic and check it on other side: MAGIC = 0x12345678 v. MAGIC = 0x78563412) along with the binary data, and apply conversion only when endianness differs. Though that approach is less elegant and has no real of advantages I'm aware of.

Dummy00001 2010-06-15 16:59:59

many thanks, I'm going to read about it.

tim 2010-06-15 17:19:25

Actually, for network stuff *big-endian* is the norm. It's even called "Network Byte Order". And since the OP is using floating-point types, there is more to it than just endianness.

caf 2010-06-16 02:35:39

@caf, thanks, dumb typo. I generally use big-endian exclusively (mostly for the standard ntoh/hton functions) thus I mix endianness all the time.

Dummy00001 2010-06-16 14:29:34

Answer 4

A:

All processors that support IEEE 754 have the same binary representation for floats (technically called singles) and doubles. The only difference will be in the endianness of the processor.

So the only incompatibility between the IBM PPC and the AMD Opteron should be the endianness of the doubles.

When you byteswap the doubles from disk to memory, DON'T DO THIS:

double swap(double a); // THIS IS NEVER THE RIGHT THING TO DO.

Passing in the double by value may pass it in through floating point registers. Because not all bit combinations are valid doubles, the processor may silently convert the double to a NaN, which may have a different bit representation than the value passed in. This is more likely to happen with a valid double that is in the opposite endian order. (See here for a more detailed explanation.)

In other words, pass the double you want to byteswap as a pointer or an array of chars. (Array of chars should be the best bet.)

MSN 2010-06-15 17:40:34

The codes are very interesting, but as I can understand, this is just for swapping-unswapping, good, but what I need is to use the swapped data on the destination system, while in the article it has been reasonably explained, and you also have mentioned here that using these codes for this goal is wrong. Thanks any way

tim 2010-06-15 23:34:05

@tim, What I mean is that you can byteswap doubles all you want, just don't pass doubles in the opposite endianness as doubles; pass them as an array of chars. So go ahead and byteswap.

MSN 2010-06-15 23:41:23

Oh, Thank you, but it is complicated, I used XDR instead. Thanks again

tim 2010-06-16 03:37:32

Answer 5

+2 A:

The code is not 100% portable if you are writing memory contents to files.

You need something called serialization. Ok, computer science term, but it basically means that you get your data and transform it into a well-defined and documented sequence of bytes, which can be read back to memory later by the same or another program. This sequence of bytes is architecture and platform-independent.

Most Unix environments already come with a XDR implementation, which provides routines for data serialization.

A simple example encoding 4 doubles to stdout (you can use shell redirection, or use fopen() to open a file instead of stdout):

XDR xdrs;
double data[4] = { 1.0, 255.41, -357.1, 123.4 };
int i;

xdrstdio_create(&xdrs, stdout, XDR_ENCODE);
for (i = 0; i < 4; i++)
    xdr_double(&xdrs, &data[i]);

Now, to get these doubles back (from stdin) and print them:

XDR xdrs;
double data;
int i;

xdrstdio_create(&xdrs, stdin, XDR_DECODE);
for (i = 0; i < 4; i++) {
    xdr_double(&xdrs, &data);
    printf("%g\n", data);
}

You can encode and decode complex structures using XDR. This was a very dumb way of sending four doubles to a file, and generally you should instead use xdr_array() to read/write arrays of some data type. The same commands, in the same order, have to be executed when saving and when loading the file. In fact, you can use rpcgen to generate C structs and their corresponding xdr functions automatically.

Juliano 2010-06-15 19:29:08

+1 for sample code using XDR.

RBerteig 2010-06-15 22:22:37

Really Thanks, It works!

tim 2010-06-16 03:35:02

ansaurus

tags:

views:

answers:

transferring binary files between systems

related questions