views:

401

answers:

6

I often have to write code in other languages that interact with C structs. Most typically this involves writing Python code with the struct or ctypes modules.

So I'll have a .h file full of struct definitions, and I have to manually read through them and duplicate those definitions in my Python code. This is time consuming and error-prone, and it's difficult to keep the two definitions in sync when they change frequently.

Is there some tool or library in any language (doesn't have to be C or Python) which can take a .h file and produce a structured list of its structs and their fields? I'd love to be able to write a script to generate my automatically generate my struct definitions in Python, and I don't want to have to process arbitrary C code to do it. Regular expressions would work great about 90% of the time and then cause endless headaches for the remaining 10%.

+2  A: 

Have you looked at Swig?

mmr
A: 

One my friend for this tasks done C-parser which he use with cog.

vitaly.v.ch
+3  A: 

Have a look at Swig or SIP that would generate interface code for you or use ctypes.

Gregory Pakosz
+6  A: 
ephemient
+3  A: 

Regular expressions would work great about 90% of the time and then cause endless headaches for the remaining 10%.

The headaches happen in the cases where the C code contains syntax that you didn't think of when writing your regular expressions. Then you go back and realise that C can't really be parsed by regular expressions, and life becomes not fun.

Try turning it around: define your own simple format, which allows less tricks than C does, and generate both the C header file and the Python interface code from your file:

define socketopts
    int16 port
    int32 ipv4address
    int32 flags

Then you can easily write some Python to convert this to:

typedef struct {
    short port;
    int ipv4address;
    int flags;
} socketopts;

and also to emit a Python class which uses struct to pack/unpack three values (possibly two of them big-endian and the other native-endian, up to you).

Steve Jessop
I definitely considered this, but often we're handed code from some other company that we need to implement a custom protocol to communicate with, and since we can't rewrite their code but do have access to their header files, this approach isn't feasible. However, if I was implementing a system with both C and Python components from scratch myself, I would definitely do this.
Eli Courtwright
Also, I just noticed that my example is still pretty horrible, since the Python code needs to account for the platform-dependent padding between "port" and "ipv4address". You could perhaps address "error-prone" by having this scheme, manually translating the headers to the DSL, and then auto-generating some tests (written in C) which ensure that your struct and the original struct are identical, by writing specific values to the various fields of both structs and then memcmping them. Then test the Python code the same way. If all matches, you're good.
Steve Jessop
... if your third-party sends you a header file that you can't translate into your DSL, then either extend the DSL or else complain ;-) But I prefer ephemient's answer, it's bound to be much less work, if only because all the padding info is pulled straight from the compiler.
Steve Jessop
+1  A: 

I have quite successfully used GCCXML on fairly large projects. You get an XML representation of the C code (including structures) which you can post-process with some simple Python.

gooli