Hi,
I have a very simple python routine that involves cycling through a list of roughly 20,000 latitude,longitude coordinates and calculating the distance of each point to a reference point.
def compute_nearest_points( lat, lon, nPoints=5 ):
"""Find the nearest N points, given the input coordinates."""
points = session.query(PointIndex).all()
oldNearest = []
newNearest = []
for n in xrange(nPoints):
oldNearest.append(PointDistance(None,None,None,99999.0,99999.0))
newNearest.append(obj2)
#This is almost certainly an inappropriate use of deepcopy
# but how SHOULD I be doing this?!?!
for point in points:
distance = compute_spherical_law_of_cosines( lat, lon, point.avg_lat, point.avg_lon )
k = 0
for p in oldNearest:
if distance < p.distance:
newNearest[k] = PointDistance(
point.point, point.kana, point.english, point.avg_lat, point.avg_lon, distance=distance
)
break
else:
newNearest[k] = deepcopy(oldNearest[k])
k += 1
for j in range(k,nPoints-1):
newNearest[j+1] = deepcopy(oldNearest[j])
oldNearest = deepcopy(newNearest)
#We're done, now print the result
for point in oldNearest:
print point.station, point.english, point.distance
return
I initially wrote this in C, using the exact same approach, and it works fine there, and is basically instantaneous for nPoints<=100. So I decided to port it to python because I wanted to use SqlAlchemy to do some other stuff.
I first ported it without the deepcopy statements that now pepper the method, and this caused the results to be 'odd', or partially incorrect, because some of the points were just getting copied as references(I guess? I think?) -- but it was still pretty nearly as fast as the C version.
Now with the deepcopy calls added, the routine does it's job correctly, but it has incurred an extreme performance penalty, and now takes several seconds to do the same job.
This seems like a pretty common job, but I'm clearly not doing it the pythonic way. How should I be doing this so that I still get the correct results but don't have to include deepcopy everywhere?
EDIT:
I've hit on a much simpler and faster solution,
def compute_nearest_points2( lat, lon, nPoints=5 ):
"""Find the nearest N points, given the input coordinates."""
points = session.query(PointIndex).all()
nearest = []
for point in points:
distance = compute_spherical_law_of_cosines( lat, lon, point.avg_lat, point.avg_lon )
nearest.append(
PointDistance(
point.point, point.kana, point.english, point.avg_lat, point.avg_lon, distance=distance
)
)
nearest_points = sorted(nearest, key=lambda point: point.distance)[:nPoints]
for item in nearest_points:
print item.point, item.english, item.distance
return
So basically I'm just making a complete copy of the input and appending a new value - the distance from the reference point. Then I just apply 'sorted' to the resulting list, specifying that the sort key should be the distance property of the PointDistance object.
This is much faster than using deepcopy although I confess I don't really understand why. I guess it is down to the efficient C implementations python's "sorted"?