views:

248

answers:

3

I'm looking for various NLP tools for a project I'm working on and right now I've found most useful the Stanford NLP projects.

Does anyone know if there are other tools that are out there that would be useful for a language understander?

And more importantly, are there tools that are NOT out there?

Most specifically, I'm looking for an api for morphophoneme analysis etc.

EDIT: I am an academic (a student working on a research project) and am mainly looking for open source or, at least, open api projects.

+3  A: 

NTLK is an interesting toolkit which allows building NLP-based applications. This can be used for practical applications which require for example POS tagging, or which implement simple classifiers or entity extractors.

I'm unsure of what a "language understander" application would encompass, however, but this sounds like something which may be beyond what can [easily] be based upon NLTK.
Reading the question completely, and its reference to morphophonics, seems to confirm that NLTK would probably not serve the OP's purpose very well; to my knowledge NTLK doesn't offer modules that deal with text at this level. You may want to check this for yourself however, as NLTK is a broad and active project and may have seen recent additions in this area.

mjv
+2  A: 

I suggest you take a look at the following:

  1. the ususal nlp libraries like Open NLP, LingPipe, NLTK, Gate, UIMA. All of these provide parsers and word stemmers (i.e. they don't give you back the root of a word, but its stem). Some also provide lemmatizers.
  2. websites which collect NLP tools. These are but a few of them: the wiki of the Association of Computational Linguistics, Language Technology World, the website of the compling dep. at Heidelberg University

I'm not aware of a tool which returns the root of a word, but, as I said, there are stemmers and lemmatizers. For lemmatization, try Tree Tagger or Morpha. Morphophonemic analysis is a term not specific enough to get you what you want.

Once you know more specifically what you need, you could search the archives of the Corpora List or post a question there.

ferdystschenko
+3  A: 

I want to chime in with a link to the MontyLingua python package, which can be found here. I think it uses a different parser than the nltk.

http://www.fslog.com/2008/09/20/montylingua3-gpled-fork-of-montylingua/ you can google a comparison with nltk.

tomcat23