The usual suspects -- profile it, find the most expensive line, figure out what it's doing, fix it. If you haven't done much profiling before, there could be some big fat quadratic loops or string duplication hiding behind otherwise innocuous-looking expressions.
In Python, two of the most common causes I've found for non-obvious slowdown are string concatenation and generators. Since Python's strings are immutable, doing something like this:
result = u""
for item in my_list:
result += unicode (item)
will copy the entire string twice per iteration. This has been well-covered, and the solution is to use "".join
:
result = "".join (unicode (item) for item in my_list)
Generators are another culprit. They're very easy to use and can simplify some tasks enormously, but a poorly-applied generator will be much slower than simply appending items to a list and returning the list.
Finally, don't be afraid to rewrite bits in C! Python, as a dynamic high-level language, is simply not capable of matching C's speed. If there's one function that you can't optimize any more in Python, consider extracting it to an extension module.
My favorite technique for this is to maintain both Python and C versions of a module. The Python version is written to be as clear and obvious as possible -- any bugs should be easy to diagnose and fix. Write your tests against this module. Then write the C version, and test it. Its behavior should in all cases equal that of the Python implementation -- if they differ, it should be very easy to figure out which is wrong and correct the problem.