ansaurus

Question

How to make words into a category. (NLP)

Answer 1

+1 A:

Google Sets does some of this, and there is some discussion that mentions supersets. However, I have not really seen any technical details in there, just ideas and discussion.

Maybe this could at least help your research...

Doug L. 2009-11-08 09:33:49

I entered the items on my wife's bedside table and it came up with 'terrorism'

Pete Kirkham 2009-11-08 10:25:45

We're watching you, Kirkham.

Jonathan Feinberg 2009-11-08 13:29:06

Answer 2

+2 A:

That problem is difficult to solve procedurally, but much progress has been made in the area lately.

Most natural language processing begins with a grammar (which may or may not be context free.) Its a set of construction rules stating how more general things are made out of more specific ones.

example context free grammar:

Sentence ::= NounPhrase VerbPhrase
NounPhrase ::= ["The"] [Adjective] Noun
Adjective ::= "big" | "small" | "red" | "green"
Noun ::= "cat" | "man" | "house"
VerbPhrase ::= "fell over"

This is obviously oversimplified, but the task of making a complete grammar to define all of english is enormous, and most real systems only define some subset of it applicable to a problem domain.

Once a grammar has been defined, (or learned using complicated algorithms known only to the likes of Google) a string, called an "exemplar" is parsed according to the grammar. which tags each word with the parts of speech. a grammar that is very complex would not just have the parts of speech you learned in school, but categories such as "Websites" "Names of old people" and "ingredients".

These categories can be laboriously built into the grammar by humans or inferred using things like Analogical Modeling or Support Vector Machines. In each, things like "chicken", "football", "BBQ", and "cricket" would be defined as points in a very high dimensional space, along with millions of other points, and then the clustering algorithms, would define groups just based on the positions of those points relative to each-other. then one might try to infer names for the groups from example text.

link text This Google search lists several techniques used in NLP, and you could learn a whole lot from them.

EDIT to just solve this problem, one might crawl the web for sentences of the form "_ is a _" to build up a database of item-category relationships. then you parse a string like above, and look for words that are known items in the database

Nathan 2009-11-08 16:11:20

Answer 3

+1 A:

The question you ask is a whole area of research called topical text categorization. A great overview of techniques is "Machine learning in automated text categorization" in ACM Computing Surveys, by Fabrizio Sebastiani.. One of the simplest techniques (though not necessarily the best performing) is to have numerous (hundreds) examples of sentences in each category, and then train a Naive Bayesian classifier on those sample sentences. NLTK contains a Naive Bayesian classifier in the module nltk.classify.naivebayes.

Ken Bloom 2009-11-12 21:04:55

Answer 4

A:

You might take a look at WordNet Domains resource by people from FBK. It is an extension of WordNet which is designed to be used for text categorization and word sense disambiguation. It allows varying degrees of granularity.

http://wndomains.fbk.eu/

One of the possible ways to apply it to your task might be to get NP-chunks out of your sentences, get their head words and from them get the categories from WordNet domains.

Aliaksandr Autayeu 2009-11-14 12:53:21

ansaurus

tags:

views:

answers:

How to make words into a category. (NLP)

related questions