Any recommendations for languages/libraries to convert sentence like:
"X bumped Y, who in turn kicked Z."
to
- X: Bumped
- Y: Was bumped, kicked Z
Any recommendations for languages/libraries to convert sentence like:
"X bumped Y, who in turn kicked Z."
to
To blatantly rip off this answer, why not try the Natural Language Toolkit?
I would suggest you use the Stanford Parser (http://nlp.stanford.edu/software/lex-parser.shtml), which is open source and relatively simple, as these things go. With it, you can extract a typed dependency parse. A dependency parse of a sentence basically decomposes a sentence into a set of binary relations r(B, A)
, where word A grammatically depends on word B.
Take your sentence
X bumped Y, who in turn kicked Z.
In this sentence, both X and Y depend on bumped to get their grammatical relationship in this sentence. The Stanford Parser would extract the following relations for them:
nsubj(bumped, X)
dobj(bumped, Y)
This means the subject of bumped is X and the direct object of bumped is Y. You could then use this information to make a grammatical relation: bumped(X, Y)
. Likewise, the Stanford Parser extracts the following relations for the rest of the sentence:
nsubj(kicked, who)
rcmod(Y, kicked)
dobj(kicked, Z)
In this case, you have the subject of kicked being "who", with Y as the rcmod
(relative clause modifier). I'm not sure what the goal of your system is, but you would probably find that you need to construct a bunch of rules manually to cover situations. In this case, your rule could equate the rcmod
with the nsubj
in order to produce kicked(Y, Z)
.
For more information on using the Stanford Parser typed dependencies, there is an excellent tutorial on the subject at the Stanford Parser website (http://nlp.stanford.edu/software/dependencies%5Fmanual.pdf).
The Stanford Parser as suggested by ealdent would do the job, I would prefer to encode it as:
A POS tagger could also work, but your sentence is complicated ("who in turn").
Apart from the Stanford parser, RASP is a possibility too - it can produce lists of grammatical relations as part of its output. See this question.