views:

27

answers:

1

I have a field in my database that can contain large blocks of text. I need to make this searchable but don't have the ability to use full text searching. Instead, on update, I want my business layer to process the block of text and extract keywords from it which I can save as searchable metadata. Ideally, these keywords could then be weighed based on the number of times they appear in the block of text. Naturally, words like "the", "and", "of", etc. should be discarded as they just add a lot of noise to the search.

Are there tools or libraries in Python that can do this filtering or should I roll my own?

+1  A: 

NLTK can help.

Ignacio Vazquez-Abrams
A little more information regarding my particular problem would be really nice. The NLTK is a HUGE library, it's difficult to figure out where to start.
Soviut