views:

325

answers:

5

How does one automatically find categories for text based on content?

A: 

It depends... a lot... Can you elaborate?

cazlab
+1  A: 

There is a good paper written on this: http://www.cs.utexas.edu/users/hyukcho/classificationAlgorithm.html

Geoffrey Chetwood
A: 

The best way to categorize content, be it text or multimedia is to use a taxonomy. Most of the well known CMSs have built in support for Taxonomy. Drupal has one of the best support for taxonomy among the various CMSs out there.

Jahangir
I don't think I'd call this the best way. I'd call it *a way*.
Gregg Lind
+1  A: 
  1. Read Data Mining: Practical Machine Learning Tools and Techniques - Ian H. Witten, Eibe Frank
  2. Use Weka or Orange
Roberto Russo
A: 

I would encourage you to look at the text classification libraries bundled with the Natural Language Toolkit. Even if you're not familiar with Python I think you'll find the API rather intuitive. There are many good examples in the NLTK Book and the people on the mailing list are quite helpful as well.

theycallmemorty