ansaurus

Question

MATLAB kMeans does not always converge to global minima

Answer 1

+2 A:

The k-means algorithm is quite sensitive to initial guess for the cluster centers. Did you try both codes with the same initial mass centers ?

The algorithm is simple, and I doubt there is much variation between your implementation and Matlab's.

Alexandre C. 2010-09-07 11:25:49

I guess you are right. To get better convergence i instead modified my algoritm to do several attempts with different initial mass centers, and then determining the best one by measuring the variance of the clusters. Low mean variance in an attempt equals compact clusters. How does that sound?

Theodor 2010-09-07 13:33:27

@Theodor: This is one possible criterion indeed.

Alexandre C. 2010-09-07 14:57:08

Answer 2

+4 A:

As @Alexandre C. explained, the K-means algorithm depends on the initial cluster centroid positions, and there is no guarantee that it will converge to the optimal solution.

The best you can do is to repeat the experiment several times with random starting points.

MATLAB's implementation offers such an option: replicates which repeats the clustering N times and pick the one with the lowest total within-cluster point-to-centroid distances. You get also to control how the initial centroids are chosen with the start option.

In addition, MATLAB provides the choice among a number of distance measures (Euclidean, Manhattan, Cosine, ...). One neat option emptyaction allows you to control what happens when a cluster loses all its assigned member during the iterations.

But the real advantage is that it employs a two-phase algorithm: the usual assign-recompute iterations, followed by an online update phase. Be sure to read the algorithm section of the documentation page for more information.

Amro 2010-09-07 21:42:30

Answer 3

+1 A:

You will probably often be disappointed by the solution that any particular run of "the k-means algorithm" (i.e. Lloyd's algorithm) comes up with. That's because Lloyd's algorithm often gets stuck in bad local minima.

Luckily, Lloyd's is just one way of solving k-means. And there's an approach that almost always finds a better local minima.

The trick is to update the cluster assignments of the data points one at a time. You can do this efficiently by keeping a count of the number of points n assigned to each mean. So that you can re-calculate the cluster mean m after removing the point x as follows:

m_new = (n * m - x) / (n - 1)

And add x to cluster mean m using:

m_new = (n * m + x) / (n + 1)

Of course because it can't be vectorized, it's a bit painful to run in MATLAB but not too bad in other languages.

If you are really keen on getting the best possible local minima, and you don't mind using exemplar-based clustering, you should look at affinity propagation. MATLAB implementations are available on the Frey lab affinity propagation page.

qdjm 2010-09-15 02:27:53

Answer 4

A:

Although K-Means++ won't solve the problem in a single run, it tends to give better results when it is run N times (compared to running the original K-Means algorithm N times).

rwong 2010-10-03 08:29:09

Yeah, I read about the kMeans++ algo on wiki but I didn't really understand how the initial clusters where initiated..

Theodor 2010-10-03 09:40:15

Try this link. http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

rwong 2010-10-03 09:42:49

ansaurus

tags:

views:

answers:

MATLAB kMeans does not always converge to global minima

related questions