views:

162

answers:

4

I'm trying to figure out some C code so that I can port it into python. The code is for reading a proprietary binary data file format. It has been straightforward thus far -- it's mainly been structs and I have been using the struct library to ask for particular ctypes from the file. However, I just came up on this bit of code and I'm at a loss for how to implement it in python. In particular, I'm not sure how to deal with the enum or the union.

#define BYTE char 
#define UBYTE unsigned char 
#define WORD short 
#define UWORD unsigned short

typedef enum {
    TEEG_EVENT_TAB1=1, 
    TEEG_EVENT_TAB2=2
} TEEG_TYPE;

typedef struct
{
        TEEG_TYPE Teeg;
        long Size;
    union

        {
            void *Ptr;  // Memory pointer
            long Offset
        };
} TEEG;

Secondly, in the below struct definition, I'm not sure what the colons after the variable names mean, (e.g., KeyPad:4). Does it mean I'm supposed to read 4 bytes?

typedef struct
{
    UWORD StimType;
    UBYTE KeyBoard;
    UBYTE KeyPad:4;
    UBYTE Accept:4;
    long Offset;
} EVENT1;

In case it's useful, an abstract example of the way I've been accessing the file in python is as follows:

from struct import unpack, calcsize

def get(ctype, size=1):
    """Reads and unpacks binary data into the desired ctype."""
    if size == 1:
        size = ''
    else:
        size = str(size)

    chunk = file.read(calcsize(size + ctype))
    return unpack(size + ctype, chunk)[0]

file = open("file.bin", "rb")
file.seek(1234)

var1 = get('i')
var2 = get('4l')
var3 = get('10s')

A: 

I don't know the answer to all of your question, but for enums that you do not need a lookup-by-value on, (is, just using it to avoid magic numbers), I like to use a small class. A regular dict is another option that works fine. If you need lookup-by-value, you may want another structure though.

class TeegType(object):
    TEEG_EVENT_TAB1 = 1
    TEEG_EVENT_TAB2 = 2

print TeegType.TEEG_EVENT_TAB1
Daenyth
+5  A: 

Enums: There are no enums in the language. Various idioms have been proposed, but none is really widespread. The most straightforward (and in this case sufficient) solution is

TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2

Unions: ctypes has unions.

The fieldname : n syntax is called a bitfield and, yeah, does mean "this is n bits big". Again, ctypes has them.

delnan
A: 

What you really need to know is:

  1. What is the size of an enum?. You will use this answer to generate your unpacking code.
  2. What is the size of a union?. Summary: the size of the largest member.
  3. How do you deal with that pointer? You should take a look at the ctypes module. For what you are doing, it may be easier to work with than the struct module. In particular, it can work with pointers arriving via C.
  4. How do you coerce/cast the data read from the struct into the right type to work with in python? This is why I recommended ctypes in the bullet above; this module has functions for performing the necessary casts.
bstpierre
A: 

The C enum declaration is a syntactic wrapper around some integer type. See http://stackoverflow.com/questions/1113855/is-the-sizeofenum-sizeofint-always. How big an int is will depend on the particular C compiler. I would probably start by trying 16 bits.

The union reserves a block of memory the size of the largest of the contained data types. Again, the exact size will depend on the C implementation, but I would expect 32 bits for a 32-bit architecture, or 64-bits if this is compiled as native 64-bit code. Generally speaking, you will be able to store the contents of the union in a Python integer or long, regardless of whether what has been saved in it is a pointer or an offset.

A more interesting question is why a pointer would ever be written to a disk file. You may find that the union field is only treated as a pointer when the TEEG struct is in memory, but when written to disk, it is always an integer offset.

As for the :4 notation, as several people have noted, these are "bit fields," meaning a sequence of bits, several of which can be packed into a single space. If I recall correctly, bitfields in C are packed into ints, so both of these 4-bit fields will be packed into a single integer. They can be unpacked with appropriate use of Python's "&" (bitwise and) and ">>" (right shift) operators. Again, exactly how the fields have been packed into the integer, and the size of the integer field itself, will depend on the particular C implementation.

Maybe the following code snippet will help you:

SIZEOF_TEEG_TYPE = 2      # First guess for enum is two bytes
FMT_TEEG_TYPE = "h"       # Could be "b", "B", "h", "H", "l", "L", "q" or "Q"

SIZEOF_LONG = 4           # Use 8 in 64-bit Unix architectures
FMT_LONG = "l"            # Use "q" in 64-bit Unix architectures
                          # Life gets more interesting if you are reading 64-bit
                          # using 32-bit Python

SIZEOF_PTR_LONG_UNION = 4 # Use 8 in any 64-bit architecture
FMT_PTR_LONG_UNION = "l"  # Use "q" in any 64-bit architecture
                          # Life gets more interesting if you are reading 64-bit
                          # using 32-bit Python

SIZEOF_TEEG_STRUCT = SIZEOF_TEEG_TYPE + SIZEOF_LONG + SIZEOF_PTR_LONG_UNION
FMT_TEEG_STRUCT = FMT_TEEG_TYPE + FMT_LONG + FMT_PTR_LONG_UNION


# Constants for TEEG_EVENTs
TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2

.
.
.

# Read a TEEG structure
teeg_raw = file_handle.read( SIZEOF_TEEG_STRUCT )
teeg_type, teeg_size, teeg_offset = struct.unpack( FMT_TEEG_STRUCT, teeg_raw )

.
.
.

# Use TEEG_TYPE information
if teeg_type == TEEG_EVENT_TAB1:
    Do something useful

elif teeg_type == TEEG_EVENT_TAB2:
    Do something else useful

else:
    raise ValueError( "Encountered illegal TEEG_EVENT type %d" % teeg_type )
Dan Menes