Suppose I give you a URL...can you analyze the words and spit out the "keywords" of that page? (besides using meta-tags)
Are there good open-source summarizers out there? (preferably Python)
Suppose I give you a URL...can you analyze the words and spit out the "keywords" of that page? (besides using meta-tags)
Are there good open-source summarizers out there? (preferably Python)
A simple text summarizer: http://pythonwise.blogspot.com/2008/01/simple-text-summarizer.html
Algorithm:
1. For each word, calculate it's frequency in the document
2. For each sentence in the document
score(sentence) = sum([freq(word) for word in sentence])
3. Print X top sentences such that their size < MAX_SUMMARY_SIZE
Frequency counts will get you some of the way but Natural Language Processing will provide better results as it uses linguistic techniques to provide more accuracy.
Topia.termextract uses a Parts-Of-Speech (POS) tagging algorithm and is available from PyPi http://pypi.python.org/pypi/topia.termextract/