views:

354

answers:

4

Any recommendations for languages/libraries to convert sentence like:

"X bumped Y, who in turn kicked Z."

to

  1. X: Bumped
  2. Y: Was bumped, kicked Z
+2  A: 

To blatantly rip off this answer, why not try the Natural Language Toolkit?

Skilldrick
+3  A: 

I would suggest you use the Stanford Parser (http://nlp.stanford.edu/software/lex-parser.shtml), which is open source and relatively simple, as these things go. With it, you can extract a typed dependency parse. A dependency parse of a sentence basically decomposes a sentence into a set of binary relations r(B, A), where word A grammatically depends on word B.

Take your sentence

X bumped Y, who in turn kicked Z.

In this sentence, both X and Y depend on bumped to get their grammatical relationship in this sentence. The Stanford Parser would extract the following relations for them:

nsubj(bumped, X)
dobj(bumped, Y)

This means the subject of bumped is X and the direct object of bumped is Y. You could then use this information to make a grammatical relation: bumped(X, Y). Likewise, the Stanford Parser extracts the following relations for the rest of the sentence:

nsubj(kicked, who)
rcmod(Y, kicked)
dobj(kicked, Z)

In this case, you have the subject of kicked being "who", with Y as the rcmod (relative clause modifier). I'm not sure what the goal of your system is, but you would probably find that you need to construct a bunch of rules manually to cover situations. In this case, your rule could equate the rcmod with the nsubj in order to produce kicked(Y, Z).

For more information on using the Stanford Parser typed dependencies, there is an excellent tutorial on the subject at the Stanford Parser website (http://nlp.stanford.edu/software/dependencies%5Fmanual.pdf).

ealdent
+1  A: 

The Stanford Parser as suggested by ealdent would do the job, I would prefer to encode it as:

  • Bump(X,Y,Past)
  • Kick(Y,Z,Past)

A POS tagger could also work, but your sentence is complicated ("who in turn").

Osama ALASSIRY
Annotating the relation to reflect tense would be good, I was just giving an example. You might want to add other annotations (like index in sentence, part of speech), as well.A POS tagger would give you the POS tags in order, but you'd have to come up with rules determining whether something is a direct object or a subject, and that will quickly get too complicated unless you're a syntactic rules junkie.
ealdent
A: 

Apart from the Stanford parser, RASP is a possibility too - it can produce lists of grammatical relations as part of its output. See this question.

Tommy Herbert