A few years ago, someone posted on Active State Recipes for comparison purposes, three python/NumPy functions; each of these accepted the same arguments and returned the same result, a distance matrix.
Two of these were taken from published sources; they are both--or they appear to me to be--idiomatic numpy code. The repetitive calculations required to create a distance matrix are driven by numpy's elegant index syntax. Here's one of them:
from numpy.matlib import repmat, repeat
def calcDistanceMatrixFastEuclidean(points):
numPoints = len(points)
distMat = sqrt(sum((repmat(points, numPoints, 1) -
repeat(points, numPoints, axis=0))**2, axis=1))
return distMat.reshape((numPoints,numPoints))
The third created the distance matrix using a single loop (which, obviously is a lot of looping given that a distance matrix of just 1,000 2D points, has one million entries). At first glance this function looked to me like the code i used to write when i was learning NumPy and i would write NumPy code by first writing pyton code and then translating it, line by line.
Several months after the Active State post, results of performance tests comparing the three were posted and discussed in a thread on the NumPy mailing list.
The function with the loop in fact significantly outperformed the other two:
from numpy import mat, zeros, newaxis
def calcDistanceMatrixFastEuclidean2(nDimPoints):
nDimPoints = array(nDimPoints)
n,m = nDimPoints.shape
delta = zeros((n,n),'d')
for d in xrange(m):
data = nDimPoints[:,d]
delta += (data - data[:,newaxis])**2
return sqrt(delta)
One participant in the thread (Keir Mierle) offered a reason why this might be true:
The reason that I suspect this will be faster is that it has better locality, completely finishing a computation on a relatively small working set before moving onto the next one. The one liners have to pull the potentially large MxN array into the processor repeatedly.
By this poster's own account, his remark is only a suspicion, and it doesn't appear that it was discussed any further.
Any other thoughts about how to account for these results?
In particular, is there a useful rule--regarding when to loop and when to index--that can be extracted from this example as guidance in writing numpy code?
For those not familiar with NumPy, or who haven't looked at the code, this comparison is not based on a edge case--it certainly wouldn't be that interesting to me if it were. Instead, this comparison involves a function that performs a common task in matrix computation (i.e., creating a result array given two antecedents); moreover, each function is in turn comprised of among the most common numpy built-ins.