tags:

views:

140

answers:

1

I am trying to build an NLP system for an assignment, for which I am allowed to use external libraries.
I am using parse trees to break down sentences into their constituent parts down to nouns, verbs, etc.
I am looking for a library or software that would let me identify which lexical form a word is in, and possibly translate it to some other form for me.
Basically, I need something with functions like isPlural, singularize, getInfinitive, etc.
I have considered the Ruby Linguistics package and a simple Porter Stemmer (for infinitives) but neither is very good.
This does not seem like a very hard problem, just very tedious.
Does anyone know of a good package/library/software that could do things like that?

+1  A: 

Typically, in order to build a parse tree of a sentence, one needs to first determine the part-of-speech and lemma information of the words in the sentence. So, you should have this information already.

But in any case, in order to map wordforms to their lemmas, and synthesize wordforms from lemmas, take a look at morpha and morphg, and also the Java version of (or front-end to) morphg contained in the SimpleNLG package. There are methods like getInfinitive, getPastParticiple, etc. See e.g. the API for the Verb class.

Kaarel
thank u for telling me abt morpha.. i found an ubuntu package for it.. and tht can do infinitives pretty easili.. still need to read the docs to figure out how to make it do the other things..
adi92