views:

881

answers:

8

I would like to translate some C code to Python code or bytecode. The C code in question is what i'd call purely algorithmic: platform independent, no I/O, just algorithms and in-memory data structures.

An example would be a regular expression library. Translation tool would process library source code and produce a functionally equivalent Python module that can be run in a sandboxed environment.

What specific approaches, tools and techniques can you recommend?


Note: Python C extension or ctypes is not an option because the environment is sandboxed.

Another note: looks like there is a C-to-Java-bytecode compiler, they even compiled libjpeg to Java. Is Java bytecode+VM too different from CPython bytecode+VM?

A: 

Why not keeping the C code and creating a Python C module which can be imported into a running Python environment?

jk
Python C extension is not an option because the environment is sandboxed. I updated question to reflect that.
Constantin
A: 

First, i'd consider wrapping the existing C library with Pythonic goodness to provide an API in the form of a python module. I'd look at swig, ctypes, pyrex, and whatever else is out there these days. The C library itself would stay there unchanged. Saves work.

But if i really had to write original Python code based on the C, there's no tool i'd use, just my brain. C allows too many funny tricks with pointers, clever things with macros, etc that i'd never trust an automated tool even if someone pointed one out to me.

I mentioned Pyrex - this is a language similar to C but also Python oriented. I haven't done much with it, but it could be easier than writing pure python, given that you're starting with C as a guide.

Converting from more constrained, tamer languages such as IDL (the data languages scientists like to use, not the other IDL) is hard, requiring manual and mental effort. C? Forget it, not until the UFO people give us their fancy software tools that are a thousand years ahead of our state of the art!

DarenW
"Macro magic" is not a fundamental issue, it is eliminated by a single preprocessor pass.
Constantin
+8  A: 

There is frankly no way to mechanically and meaningfully translate C to Python without suffering an insane performance penalty. As we all know Python isn't anywhere near C speed (with current compilers and interpreters) but worse than that is that what C is good at (bit-fiddling, integer math, tricks with blocks of memory) Python is very slow at, and what Python is good at you can't express in C directly. A direct translation would therefore be extra inefficient, to the point of absurdity.

The much, much better approach in general is indeed to keep the C the C, and wrap it in a Python extension module (using SWIG, Pyrex, Cython or writing a wrapper manually) or call the C library directly using ctypes. All the benefits (and downsides) of C for what's already C or you add later, and all the convenience (and downsides) of Python for any code in Python.

That won't satisfy your 'sandboxing' needs, but you should realize that you cannot sandbox Python particularly well anyway; it takes a lot of effort and modification of CPython, and if you forget one little hole somewhere your jail is broken. If you want to sandbox Python you should start by sandboxing the entire process, and then C extensions can get sandboxed too.

Thomas Wouters
A: 

Any automatic translation is going to suffer for not using the power of Python. C-type procedural code would run very slowly if translated directly into Python, you would need to profile and replace whole sections with more Python-optimized code.

paxdiablo
+3  A: 

The fastest way (in terms of programmer effort, not efficiency) would probably involve using an existing compiler to compile C to something simple (for example LLVM) and either:

  • interpret that in Python (exorbitant performance penalty)
  • translate that to Python (huge performance penalty)
  • translate that to Python bytecode (big performance penalty)

Translating C to Python directly is possible (and probably yields faster code than the above approaches), but you'd be essentially writing a C compiler backend, which is a huge task.

Edit, afterthought: A perhaps even more quick-and-dirty way of doing that is to take the parse tree for the C code, transform that to a Python data structure and interpret that in Python.

Rafał Dowgird
I did think of LLVM, but didn't think of interpreting it. Good point.
Constantin
Well, maybe interpreting the C parse tree directly in Python would be easier - added that in an edit.
Rafał Dowgird
A: 

You can always compile the C code, and load in the libraries using ctypes in python.

Sridhar Iyer
It wasn't me who downvoted you, but i can understand why - please pay attention to the question. Target environment is sandboxed and accepts only pure Python modules.
Constantin
agreed.. my fault.
Sridhar Iyer
A: 

I'd personnaly use a tool to extract an uml sheme from the C code, then use it to generate python code.

From this squeleton, I's start to get rid of the uncessary C-style structures and then I'd fill the methods with python code.

I think it would be the safer and yet most efficient way.

e-satis
Which tool would you use and which UML diagram would you generate? Static Class Diagram? :-S
Constantin
POWER AMC will do the job, but is pretty expensive. And yes, I don't think any others that static class diagrams can be rendered efficiently enough by an automatic process. So you'll have to translate the class logic from C to Python. But it's still much easier that doing everything from scratch.
e-satis
+1  A: 

Write a C interpreter in pure Python? ;-)

theller
Hey, Thomas, is that your next project, by any chance? :)
Constantin
If I had to write such a beast I would of course use Python. But I don't have to...
theller