views:

561

answers:

5

I'm trying to implement application that can determine meaning of sentence, by dividing it to smaller pieces. So I need to know what words are subject, object etc. so that my program can know how to handle this sentence.

+2  A: 

I guess there is not "simple" way to do this. You have to build a linguistic analyzer (which is quite possible), however, a language as a lot of exceptional cases. And that is what makes implementing a linguistic analyzer that hard.

Henri
+4  A: 

You should look at the Natural Language Toolkit, which is for exactly this sort of thing.

See this section of the manual: Categorizing and Tagging Words - here's an extract:

>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]

"Here we see that and is CC, a coordinating conjunction; now and completely are RB, or adverbs; for is IN, a preposition; something is NN, a noun; and different is JJ, an adjective."

RichieHindle
NLTK is a good resource for this sort of thing, but part-of-speech tagging doesn't give enough grammatical information for distinguishing between subject/object roles. I think Chapter 8 of the manual (Analyzing Sentence Structure) would be more appropriate.
StompChicken
+5  A: 

This is an open research problem. You can get an overview on Wikipedia, http://en.wikipedia.org/wiki/Natural_language_processing. Consider phrases like "Time flies like an arrow, fruit flies like a banana" - unambiguously classifying words is not easy.

Barry Kelly
+1, I started an answer with that exact same quote! :)
j_random_hacker
+2  A: 

The specific problem you mention, the identification of the subject and objects of a clause, is accomplished by syntactic parsing. You can get a good idea of how parsing works by using this demo of parsing software developed by Stanford University.

However, syntactic parsing does not determine the meanining of a sentence, only its structure. Determining meaning (semantics) is a very hard problem in general and there is no technology that can really 'understand' a sentence in the same way that a human would. Although there is no general solution, you may be able to do something in a very restricted subject domain. For example, is the data you want to analyse about a narrow topic with a limited set of 'things' that people talk about?

StompChicken
+1  A: 

StompChicken has given the right answer to this question, but I'd like to add that the concepts of subject and object are known as grammatical relations, and that Briscoe and Carroll's RASP is a parser that can go the extra step of deducing a list of relations from the parse.

Here's some example output from their demo page. It's an extract from the output for a sentence that begins "We describe a robust accurate domain-independent approach...":

(|ncsubj| |describe:2_VV0| |We:1_PPIS2| _)
(|dobj| |describe:2_VV0| |approach:7_NN1|)

Tommy Herbert