views:

599

answers:

8

I'm attempting to write a Python C extension that reads packed binary data (it is stored as structs of structs) and then parses it out into Python objects. Everything works as expected on a 32 bit machine (the binary files are always written on 32bit architecture), but not on a 64 bit box. Is there a "preferred" way of doing this?


It would be a lot of code to post but as an example:

struct
{
 WORD version;
 BOOL upgrade;
 time_t time1;
            time_t  time2;
} apparms;

File *fp;
fp = fopen(filePath, "r+b");
fread(&apparms, sizeof(apparms), 1, fp);
return Py_BuildValue("{s:i,s:l,s:l}",
  "sysVersion",apparms.version,
  "powerFailTime", apparms.time1,
  "normKitExpDate", apparms.time2
 );

Now on a 32 bit system this works great, but on a 64 bit my time_t sizes are different (32bit vs 64 bit longs).


Damn, you people are fast.

Patrick, I originally started using the struct package but found it just way to slow for my needs. Plus I was looking for an excuse to write a Python Extension.

I know this is a stupid question but what types do I need to watch out for?

Thanks.

A: 

Why not paste the code?

mbac32768
+1  A: 

What's your code for reading the binary data? Make sure you're copying the data into properly-sized types like int32_t instead of just int.

John Millikin
A: 

Why aren't you using the struct package?

Patrick
+2  A: 

The 'struct' module should be able to do this, although alignment of structs in the middle of the data is always an issue. It's not very hard to get it right, however: find out (once) what boundary the structs-in-structs align to, then pad (manually, with the 'x' specifier) to that boundary. You can doublecheck your padding by comparing struct.calcsize() with your actual data. It's certainly easier than writing a C extension for it.

In order to keep using Py_BuildValue() like that, you have two options. You can determine the size of time_t at compiletime (in terms of fundamental types, so 'an int' or 'a long' or 'an ssize_t') and then use the right format character to Py_BuildValue -- 'i' for an int, 'l' for a long, 'n' for an ssize_t. Or you can use PyInt_FromSsize_t() manually, in which case the compiler does the upcasting for you, and then use the 'O' format characters to pass the result to Py_BuildValue.

Thomas Wouters
+2  A: 

Explicitly specify that your data types (e.g. integers) are 32-bit. Otherwise if you have two integers next to each other when you read them they will be read as one 64-bit integer.

When you are dealing with cross-platform issues, the two main things to watch out for are:

  1. Bitness. If your packed data is written with 32-bit ints, then all of your code must explicitly specify 32-bit ints when reading and writing.
  2. Byte order. If you move your code from Intel chips to PPC or SPARC, your byte order will be wrong. You will have to import your data and then byte-flip it so that it matches up with the current architecture. Otherwise 12 (0x0000000C) will be read as 201326592 (0x0C000000).

Hopefully this helps.

Dan Udey
A: 

It would be a lot of code to post but as an example:

struct
{
 WORD version;
 BOOL upgrade;
 time_t time1;
            time_t  time2;
} apparms;

File *fp;
fp = fopen(filePath, "r+b");
fread(&apparms, sizeof(apparms), 1, fp);
return Py_BuildValue("{s:i,s:l,s:l}",
  "sysVersion",apparms.version,
  "powerFailTime", apparms.time1,
  "normKitExpDate", apparms.time2
 );

Now on a 32 bit system this works great, but on a 64 bit my time_t sizes are different (32bit vs 64 bit longs).

Mark
Moved this into the main question -- if you have more information to add, please add it to the question, not as an answer (unless you found the answer).
John Millikin
A: 

Damn, you people are fast.

Patrick, I originally started using the struct package but found it just way to slow for my needs. Plus I was looking for an excuse to write a Python Extension.

I know this is a stupid question but what types do I need to watch out for?

Thanks.

Mark
You pretty much need to watch out for any type other than the architecture neutral ones in stdint.h. (Even then, you're still need to deal with endian issues). From your struct above, pretty much all members look archiecture specific. You'll need to define them by their size, and then upcast them.
Brian
+1  A: 

You need to make sure you're using architecture independant members for your struct. For instance an int may be 32 bits on one architecture and 64 bits on another. As others have suggested, use the int32_t style types instead. If your struct contains unaligned members, you may need to deal with padding added by the compiler too.

Another common problem with cross architecture data is endianness. i386 architectre is little-endian, but if you're reading on a completely different machine( eg. an Alpha or Sparc), you'll have to worry about this too.

The python struct module deals with both these situations, using the prefix passed as part of the format string.

  • @ - Use native size, endianness and alignment. i= sizeof(int), l= sizeof(long)
  • = - Use native endianness, but standard sizes and alignment (i=32 bits, l=64 bits)
  • < - Little-endian standard sizes/alignment
  • > - Big-endian standard sizes / alignment

In general, if the data passes off your machine, you should nail down the endianness and the size / padding format to something specific. ie. use "<" or ">" as your format. If you want to handle this in your C extension, you may need to add some code to handle it.

Brian