I have a function which updates the centroid (mean) in a K-means algoritm. I ran a profiler and noticed that this function uses a lot of computing time.
It looks like:
def updateCentroid(self, label):
X=[]; Y=[]
for point in self.clusters[label].points:
X.append(point.x)
Y.append(point.y)
self.clusters[label].centroid.x = numpy.mean(X)
self.clusters[label].centroid.y = numpy.mean(Y)
So I ponder, is there a more efficient way to calculate the mean of these points? If not, is there a more elegant way to formulate it? ;)
EDIT:
Thanks for all great responses! I was thinking that perhaps I can calculate the mean cumulativly, using something like:
where x_bar(t) is the new mean and x_bar(t-1) is the old mean.
Which would result in a function similar to this:
def updateCentroid(self, label):
cluster = self.clusters[label]
n = len(cluster.points)
cluster.centroid.x *= (n-1) / n
cluster.centroid.x += cluster.points[n-1].x / n
cluster.centroid.y *= (n-1) / n
cluster.centroid.y += cluster.points[n-1].y / n
Its not really working but do you think this could work with some tweeking?