views:

53

answers:

2

Hi, I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with "Python".Does anyone know an of the shelf package for these? If not a code which is fast enough for large documents is also welcome. Thanks

+4  A: 

NLTK supports this.

Ken Bloom
Yes, use NLTK. It's open source and runs on Windows, Mac, and Linux.
Steven Rumbalski
+2  A: 

If for some reason you don't want to use NLTK, you can try PyStemmer. For stop words just download a list (google it) and filter them out.

lazy1