views:

223

answers:

3

I've been doing some performance testing in order to improve the performance of a pet project I'm writing. It's a very number-crunching intensive application, so I've been playing with Numpy as a way of improving computational performance.

However, the result from the following performance tests were quite surprising....

Test Source Code (Updated with test cases for hoisting and batch submission)

import timeit

numpySetup = """
import numpy
left = numpy.array([1.0,0.0,0.0])
right = numpy.array([0.0,1.0,0.0])
"""

hoistSetup = numpySetup +'hoist = numpy.cross\n'

pythonSetup = """
left = [1.0,0.0,0.0]
right = [0.0,1.0,0.0]
"""

numpyBatchSetup = """
import numpy

l = numpy.array([1.0,0.0,0.0])
left = numpy.array([l]*10000)

r = numpy.array([0.0,1.0,0.0])
right = numpy.array([r]*10000)
"""

pythonCrossCode = """
x = ((left[1] * right[2]) - (left[2] * right[1]))
y = ((left[2] * right[0]) - (left[0] * right[2]))
z = ((left[0] * right[1]) - (left[1] * right[0]))
"""

pythonCross = timeit.Timer(pythonCrossCode, pythonSetup)
numpyCross = timeit.Timer ('numpy.cross(left, right)' , numpySetup)
hybridCross = timeit.Timer(pythonCrossCode, numpySetup)
hoistCross = timeit.Timer('hoist(left, right)', hoistSetup)
batchCross = timeit.Timer('numpy.cross(left, right)', numpyBatchSetup) 

print 'Python Cross Product : %4.6f ' % pythonCross.timeit(1000000)
print 'Numpy Cross Product  : %4.6f ' % numpyCross.timeit(1000000) 
print 'Hybrid Cross Product : %4.6f ' % hybridCross.timeit(1000000) 
print 'Hoist Cross Product  : %4.6f ' % hoistCross.timeit(1000000) 
# 100 batches of 10000 each is equivalent to 1000000
print 'Batch Cross Product  : %4.6f ' % batchCross.timeit(100) 

Original Results

Python Cross Product : 0.754945 
Numpy Cross Product  : 20.752983 
Hybrid Cross Product : 4.467417 

Final Results

Python Cross Product : 0.894334 
Numpy Cross Product  : 21.099040 
Hybrid Cross Product : 4.467194 
Hoist Cross Product  : 20.896225 
Batch Cross Product  : 0.262964 

Needless to say, this wasn't the result I expected. The pure Python version performs almost 30x faster than Numpy. Numpy performance in other tests has been better than the Python equivalent (which was the expected result).

So, I've got two related questions:

  • Can anyone explain why NumPy is performing so poorly in this case?
  • Is there something I can do to fix it?
+3  A: 

Try this with larger arrays. I think that just the cost of calling the methods of numpy here overruns the simple several list accesses required by the Python version. If you deal with larger arrays, I think you'll see large wins for numpy.

Eli Bendersky
In this particular case, 3 component arrays (x,y,z co-ordinates) are by far the most common case. What's also a bit weird is that even reading from numpy arrays, the python code is still faster. If it was call overhead, I'd expect that to be slowed down even more than the pure NumPy solution.
Adam Luchjenbroers
@Adam: but by reading from numpy's arrays you save the overhead of calling the `cross` function itself, which is a dynamically loaded extension so it goes through at least a couple of pointers. For such short arrays it indeed makes sense as a micro-optimization to unroll the call to `cross`
Eli Bendersky
I just added a test case where I batched the arrays together, and saw a considerable performance boost. So I'd say the overhead theory is correct. Looks like if I want to use Numpy for a performance boost I'll need to find a way of batching these operations together.
Adam Luchjenbroers
+4  A: 

You can see the source code yourself here: http://www.google.com/codesearch/p?hl=en#5mAq98l-MUw/trunk/dnumpy/numpy/core/numeric.py&q=cross%20package:numpy&sa=N&cd=1&ct=rc

numpy.cross just handles lots of cases and does some extra copies.

In general, numpy is going to be plenty fast enough for slow things like matrix multiplication or inversion - but operations on small vectors like that have a lot of overhead.

dmazzoni
+1  A: 

To reduce the numpy calling overhead, you might try using cython as an intermediate to call into the numpy functions.

See Fast numerical computations with Cython (SciPy 2009) for details.

Ryan Ginstrom