I'm in the process of tuning a pet project of mine to improve its performance. I've already busted out the profiler to identify hotspots but I'm thinking understanding Pythons performance characteristics a little better would be quite useful going forward.
There are a few things I'd like to know:
How smart is its optimizer?
Some modern compilers have been blessed with remarkably clever optimisers, that can often take simple code and make it run faster than any human attempts at tuning the code. Depending on how smart the optimizer is, it may be far better for my code to be 'dumb'.
While Python is an 'interpreted' language, it does appear to compile down to some form of bytecode (.pyc). How smart is it when it does this?
- Will it fold constants?
- Will it inline small functions or unroll short loops?
- Will it perform complex data/flow analysis I'm not qualified to explain properly.
How fast are the following operations (comparatively)
- Function calls
- Class instantiation
- Arithmetic
- 'Heavier' math operations such as sqrt()
How are numbers handled internally?
How are numbers stored within Python. Are they stored as integers / floats internally or moved around as a string?
NumPy
How much of a performance difference can NumPy make? This application makes heavy use of Vectors and related mathematics. How much of a difference can be made by using this to accelerate these operations.
Anything else interesting
If you can think of anything else worth knowing, feel free to mention it.
Some background...
Since there are a few people bringing in the 'look at your algorithms first' advice (which is quite sensible advice, but doesn't really help with my purpose in asking this question) I'll add a bit here about whats going on, and why I'm asking about this.
The pet project in question is a ray-tracer written in Python. It's not yet very far along and currently just hit tests against two objects (a triangle and a sphere) within the scene. No shading, shadowing or lighting calculations are being performed. The algorithm is basically:
for each x, y position in the image:
create a ray
hit test vs. sphere
hit test vs. triangle
colour the pixel based on the closest object, or black if no hit.
Algorithmic refinements in ray-tracing generally work by eliminating objects in the scene early. They can provide a considerable boost for complex scenes, but if this ray-tracer can't hit test against a mere two objects without struggling, then it's not going to be able to handle very much at all.
While I realise that a Python based ray-tracer won't quite be able to reach the performance of a C based one, given that real-time ray tracers like Arauna can manage 15-20 FPS on my computer rendering reasonably complex scenes at 640x480, I'd expect rendering a very basic 500x500 image in Python to be doable in under a second.
Currently, my code is taking 38 seconds. It seems to me like it really shouldn't take that long.
Profiling shows the bulk of the time being spent in the actual hit testing routines for these shapes. This isn't particularly surprising in a ray tracer, and what I expected. The call-counts for these hit tests are each 250,000 (500x500 exactly), which would indicate they're being called exactly as often as they should be. This is a a pretty text-book case of the 3% where optimization is advisable.
I'm planning on doing the full timing / measuring thing as I work on improving the code. However, without some basic knowledge of what costs what in Python my attempts to tune my code would be little more than stumbling in the dark. I figured it would serve me well to gain a little knowledge to light the way.