views:

296

answers:

2

I'm intending to use SQL version of WordNet and I have a problem finding a way to lemmatize words in order to find them in the DB; I can't use the WordNet lemmatizer itself because it is applied to the textual version of WorldNet.

I've read here that there is a good lemmatizer that returns real words - and that's exactly what I need. I downloaded "Morpha", the suggested lemmatizer, but I don't understand how to use it.

  • Is any compilation needed?
  • Which file should I use?
  • How can I use it in an application that accesses the WordNet SQL DB?
A: 

Minnen et al's paper on Morpha might be a good place to start to understand how the lemmatizer works. It's been a while since I'd had any experience with it myself, but I'm pretty sure it works just as an off-the-shelf binary.

Depending on performance, you may need to POS-tag your terms beforehand, but that's about the same issue you'll have querying WordNet, so it's starting to sound like you'll need to climb that hill either way.

You would basically use the root form when querying the Wordnet DB, but if you're using it just for that, I'd urge you to try the Morphy stemmer, which was specifically designed for Wordnet, and will reliably match to the root forms listed therein.

Robert Elwell
A: 

You also might want to check out TTT2, an NLP Pipeline that tokenizes, lemmatizes etc. all in one or separately. Easy to use and well documented: http://www.ltg.ed.ac.uk/software/lt-ttt2

Don Tuggener