views:

237

answers:

2

I'm mainly just looking for a discussion of approaches on how to go from decentralized, non-normalized, completely open user-submitted tags, to start making sense of all of it through combining them into those semantic groups they called "clusters".

Does it take actual people to figure out what people actually mean by the tags used, or can it be done simply by automatically analyzing how often the tags go together?

That kind of stuff. Feel free to elaborate wildly :) (Also, if this has been discussed elsewhere, I'd love to hear about it).

+3  A: 

Read this article: Automated Tag Clustering. It provides a good overview of the existing approaches and describes the algorithms for tag clustering.

Igor Krivokon
Exactly the kind of thing I was looking for, thanks!
Baby Diego
+1  A: 

Algorithms of the Intelligent Web (Manning) (esp. Chapter 4) and a book with a similar title from O'Reilly cover clustering algorithms. The Manning book starts with naive SQL approaches and moves to K-means, ROCK, and DBSCAN. It's more generalized than just focusing on tags, but easy to apply in that context. Code is presented in Java but is easily adapted to Ruby (sometimes more easily than adapting the Java code to your problem).

Chapter 5 covers classifications, which is about building topologies, and discusses Bayesian algorithms.

JasonTrue