ansaurus

Question

Which word stemmer should I use in nltk?

Answer 1

+3 A:

RSLP is for portugese. I'm guessing you want english. Regexp would require you to develop your own stemming expressions, so I think that can be ignored as well. The WordnetStemmer requires that you know the part-of-speech for the word, so you'd have to do pos tagging first in order to use it. I've used the porter stemming algorithm and its pretty good, but the lancaster algorithm is newer, so it might be better. You might want to try using a combination of stemmers, where you choose the shortest stem from each stemmer. Anyway, bottom line is that PorterStemmer is a good default choice.

Jacob 2009-08-14 23:21:41

Answer 2

A:

I am trying a similar project and really like your answer, Jacob. But can you please tell me - how do I go about tagging every word in (say) a paragraph with its relevant POS tag? Is there a function that I can call to do that with the nltk? or does that have to be done in some other way (how?)

Thanks

=================

Being new to stackoverflow, I just discovered that deleting my post is not very easy. In any case, I just found a solution to this: the nltk.pos_tag(...) function

Thanks

inspectorG4dget 2009-10-31 20:16:57

Answer 3

+1 A:

It may be a bit different than you are asking, but the Nodebox Lingustics library contains an is_emotive() function which seems to check words to see if they are recursive hyponyms of certain emotional words. From commonsense.py

    ekman = ["anger", "disgust", "fear", "joy", "sadness", "surprise"]
    other = ["emotion", "feeling", "expression"]

Not a stemmer, but an interesting approach to check out.

tomcat23 2010-01-22 08:45:25

ansaurus

tags:

views:

answers:

Which word stemmer should I use in nltk?

related questions