views:

298

answers:

4

I'm having a lot of fun learning Python by writing a genetic programming type of application.

I've had some great advice from Torsten Marek, Paul Hankin and Alex Martelli on this site.

The program has 4 main functions:

  • generate (randomly) an expression tree.
  • evaluate the fitness of the tree
  • crossbreed
  • mutate

As all of generate, crossbreed and mutate call 'evaluate the fitness'. it is the busiest function and is the primary bottleneck speedwise.

As is the nature of genetic algorithms, it has to search an immense solution space so the faster the better. I want to speed up each of these functions. I'll start with the fitness evaluator. My question is what is the best way to do this. I've been looking into cython, ctypes and 'linking and embedding'. They are all new to me and quite beyond me at the moment but I look forward to learning one and eventually all of them.

The 'fitness function' needs to compare the value of the expression tree to the value of the target expression. So it will consist of a postfix evaluator which will read the tree in a postfix order. I have all the code in python.

I need advice on which I should learn and use now: cython, ctypes or linking and embedding.

Thank you.

A: 

Another great option is boost::python which lets you easily wrap C or C++.

Of these possibilities though, since you have python code already written, cython is probably a good thing to try first. Perhaps you won't have to rewrite any code to get a speedup.

clahey
Yes, Ive heard of it. It means I'd have to learn enough c to write the funcions in it, which is probably a good idea.
Peter Stewart
+2  A: 

Cython is the quickest to get the job done, either by writing your algorithm directly in Cython, or by writing it in C and bind it to python with Cython.

My advice: learn Cython.

Olivier
cython appealed most to me, but would it be the fastest?
Peter Stewart
+5  A: 

Ignore everyone elses' answer for now. The first thing you should learn to use is the profiler. Python comes with a profile/cProfile; you should learn how to read the results and analyze where the real bottlenecks is. The goal of optimization is three-fold: reduce the time spent on each call, reduce the number of calls to be made, and reduce memory usage to reduce disk thrashing.

The first goal is relatively easy. The profiler will show you the most time-consuming functions and you can go straight to that function to optimize it.

The second and third goal is harder since this means you need to change the algorithm to reduce the need to make so much calls. Find the functions that have high number of calls and try to find ways to reduce the need to call them. Utilize the built-in collections, they're very well optimized.

If you have done all the above and still have performance problem and you're on an x86 platform (basically most CPU), then start looking at Psyco. Psyco can optimize a Python code with the need to change your python code at all.

If you're doing a lot of number and array processing, you should take a look at Numpy/Scipy and gmpy third party modules.

Next to try is Cython. Cython is a slightly different language than Python, in fact Cython is actually C with Python's syntax.

For parts of your code that is in very tight loops that you can no longer optimize using any other ways, you may want to rewrite it as C extension. Python has a very good support for extending with C.

Lie Ryan
A: 

Try to work your fitness function so that it will support memoization. This will replace all calls that are duplicates of previous calls with a quick dict lookup.

Paul McGuire