views:

113

answers:

2

In k-means and KSOM (Kohonen's Self Organizing Map), which one gives the better retrieval performance? And how to calculate that performance?

A: 

You need to precisely define your proposed use. It's very likely that any two alternative methods will represent trades-off, for some kinds of usage one will be faster than the other, or one will give greater precision than the other. This is pretty much tyhe challenge with any computer systems, published benchmarks cannot reliably be transferred to specific applications, you very often need to test using your own data and patterns of usage.

What's worse, you often find that quite small changes in requriements make significant changes in realtive performance.

So I'm afraid that you need to perform your own benchmarking and testing for your particular applications.

djna
+1  A: 

I think you should better highlight your usage (in terms of shape of data, number of elements, and priors you can know about the data before running clustering techniques). K-means is a very light and fast algorithm but with major drawbacks:

  1. Initialization: better results come from random cluster centroids, as the algorithm itself doesn't contain any "local minima avoidance" rule.
  2. Number of clusters: you should know in advance how many cluster you're going to map onto data
  3. No dependency on "shape" of clusters: K-means aim is to balance the size of the partitions in the space, and in literature implementation there's no way (almost) to tweak the flow w.r.t other parameters (second order stats, measures of compactness and so on).

On the other hand, SOM (or KSOM as you name it) is mostly used for taxonomies or for subdivisions in spaces with strong measures of fitness, and can take advantage of more structured priors than K-Means. You can select your own kernel function to impose constraints on network's shape and many other advanced practices that should deserve more room to be described than just a couple of lines. Drawback: training stage, not as fast as K-Means, unuseful in certain domains (when kernel function doesn't approximate well local data dispersion).

Hope these can help you.

ZZambia
Yes, I know the training time of K-means is a quite faster than SOM. SOM takes longer training time as it requires much number of iterations to run the algorithm. How about accuracy(Precision and Recall)? Can we use F-measure formula in information retrieval for measuring training and testing accuracy?F-measure=2.(Precision.Recall)/(Precision+Recall)
henrika
It's not a matter of "training". You don't have to traint K-Means, that's an unsupervised technique. Instead you have to choose your "priors" when dealing with K-Means, and priors depend on data. Several examples are in literature of two-step algorithms: the former to estimate priors (and for K-Means, also, # of clusters) and the latter to actually do the computation. For punctuality measures for K-Means you have too weak assumptions in the algorithm to ensure accuracy when doing iteration. I suggest you to look for compactness measures to do after each iteration step.
ZZambia