views:

204

answers:

2

I need find words training words and their classification. Simple classification such as . Sports Entertainment and Politics things like that.

Where Can i find the words and their classifications. I know many universities have done Bag of words classifications. Is there any repository of training examples ?

A: 

I do not know such list of words, but can suggest to use a copy of Wikipedia and wiki classification. You can parse the XML version of Wikipedia (i have done that) and collect words from different topics.

Ross
+1  A: 

This is not exactly what you are looking for but you might find http://labs.google.com/sets interesting.
You can put in a bunch of words, and it will spit out a list of related words, which you could recursively throw back into the first page to get even more related words..

Alternatively, download a huge chunk of wikipedia articles (where you already know the category of each page [ http://en.wikipedia.org/wiki/Special:Categories ]) and write a simple script to pick words which have high frequency in articles from one category but very low frequency in articles from other categories

adi92